Post

An overview of ARM Memory Tagging Extension (MTE)

An overview of ARM Memory Tagging Extension (MTE), including its design, implementation, supported features by versions, and the new vMTE proposal.

An overview of ARM Memory Tagging Extension (MTE)

Introduction

Semantics of MTE instructions

  • irg Xd, Xn Copy Xn into Xd, insert a random 4-bit Address Tag into Xd.
  • stg Xd, [Xn] (Store Allocation Tag) updates Allocation Tag for [Xn, Xn + 16) to the Address Tag of Xd.
  • addg Xd, Xn, #<immA>, #<immB> Xd = Xn + immA, with Address Tag modified by #immB. Similarly, there exists a subg instruction.

Tag Checking Modes

  • Synchronous (Precise): The processor stops at the exact instruction that causes the tag violation. This is better for debugging but causes lower performance.
  • Asynchronous (Deferred): The dependency between the tag check and the memory operation is relaxed. The processor sets a flag, and the OS reports the violation later. This is faster and preferred for production systems.
  • Asymmetric : Introduced since ARMv9.2+. ASYNC for stores and SYNC for loads.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/prctl.h>

int main() {
  // Enable MTE in synchronous mode
  int rc = prctl(PR_SET_TAGGED_ADDR_CTRL,
                 PR_TAGGED_ADDR_ENABLE | PR_MTE_TCF_SYNC,
                 0, 0, 0);
  if (rc < 0) {
    perror("prctl");
    return 1;
  }

  char *ptr = malloc(32);
  strcpy(ptr, "Hello, ARM MTE!");
  printf("%s\n", ptr);
  free(ptr);

  return 0;
}

ARM Virtual MTE (vMTE) [1]

While MTE enables memory safety by tagging memory and pointers and comparing the tags on memory accesses, its implementation of tag storage is not optimised. MTE memory tags are directly tied to physical memory, which implies that every physical location is potentially taggable and thus requires a reserved tag slot. That will require upfront allocation for tag storage, even if the application does not use MTE or only uses it very restrictively, for example, tagging heap allocations but not the stack or globals.

The vMTE works on removing this upfront allocation overhead by decoupling memory tags from physical memory, but instead tying them to the virtual address space. In other words, vMTE tag storage is virtualised and managed by the OS, just like regular virtual memory – tag storage is allocated on demand on a page granularity, and swapped in and out by the OS as needed. This is significantly more memory efficient and scalable, making it more practical for a wider range of applications.

User-level applications should not be aware of which MTE or vMTE implementaion is being used by the kernel and hardware, as all the details of tag management are handled transparently without breaking the MTE programming model.

References

  1. ARM(n.d.).Future Architecture Technologies: PoE2 and vMTE.ARM Community Blog.https://developer.arm.com/community/arm-community-blogs/b/architectures-and-processors-blog/posts/future-architecture-technologies-poe2-and-vmte.
This post is licensed under CC BY 4.0 by the author.