Linux Kernel has merged The Restartable Sequences (RSEQ) several years ago, and the GNU C Library already uses RSEQ to perform faster user space operations on each CPU data. RSEQ will be further improved in Linux 6.3, which will be released next year.
By avoiding atomic operations (operations that are not interrupted by the thread scheduling mechanism) by incrementing per CPU counters, modifying per CPU spin locks, reading/writing per CPU ring buffers, etc., RSEQ can significantly optimize performance and thus provide excellent benchmarking results.
Mathieu Desnoyers, who leads much of the work on RSEQ, has recently been working on extending the Restartable Sequences ABI and exposing the NUMA node ID, mm_cid and mm_numa_cid fields.
Desnoyers said in the patch introduction.
"NUMA node ID allows for faster getcpu in libc (2). The per-memory-map concurrency id (mm_cid) allows ideal scaling (down or up) of user-space per-cpu data structures. The concurrency ID allocated in the memory map can be tracked by a scheduler. The scheduler determines this based on the number of concurrent threads running, CPU affinity, the number of cpuset and logical cores applied to these threads, and other parameters. The NUMA-aware concurrency id (mm_numa_cid) is similar to mm_cid, except that it tracks the NUMA node ID associated with each cid. On NUMA systems, when a NUMA-aware concurrency ID is observed in user space associated with a NUMA node, it guarantees that the NUMA node will never be changed unless a kernel-level NUMA configuration change occurs. This is useful for NUMA-aware per-cpu data structures running in environments where processes belonging to a cpuset or a group of processes are fixed to a group of kernels belonging to a subset of the system's NUMA nodes."