What RDMA Does
Traditionally, when data is transferred between systems:
It’s copied from user space to kernel space.
Passed through the network stack.
Received in the kernel and copied again to user space.
With RDMA, data can be read/written directly from the memory of one machine to another, bypassing the kernel and reducing CPU usage and latency.
How RDMA Works
No CPU interrupts on the receiving side.
No context switches or system calls during data transfer.
Uses zero-copy principles.
Memory regions are pre-registered with the NIC.
The NIC (RDMA-capable) directly reads/writes from/to memory.
RDMA Protocols
InfiniBand
High-performance computing (HPC) standard.
Low-latency, high-bandwidth.
RoCE (RDMA over Converged Ethernet)
Runs RDMA on standard Ethernet.
Needs lossless Ethernet (DCB - Data Center Bridging).
iWARP
RDMA over standard TCP/IP stack.
More compatible but slightly higher latency.
RDMA Advantages
Benefit Description
Ultra-low latency No kernel involvement or context switches.
Zero copy No intermediate memory buffers or CPU copying.
High throughput NIC handles data transfer directly.
Low CPU utilization Frees CPU for application-level processing.
RDMA vs Traditional Networking
Feature Traditional Networking RDMA
CPU Involvement High Low
Memory Copy Multiple copies Zero copy
System Calls Yes No (once set up)
Latency Higher Ultra-low
Performance Good Excellent
Hardware Requirements
To use RDMA, you typically need:
RDMA-capable NICs (e.g., Mellanox, Intel, Broadcom)
Lossless network support (especially for RoCE)
RDMA drivers and libraries:rdma-core, libibverbs, libmlx5 For programming: Verbs API, RDMA CM, or higher-level frameworks (e.g., DAOS, NVMe-oF, Libfabric)
Common Use Cases
High-speed datacenter communication
Distributed databases (e.g., CockroachDB, Oracle RAC)
HPC clusters
Storage systems (e.g., NVMe over Fabrics)
Machine learning training clusters
Real-time data replication
RDMA in Code (Simplified Concept)
Here’s a pseudo-view of RDMA vs traditional send:
// Traditional
send(socket, buffer, length, 0); // system call, data copied
// RDMA
rdma_post_write(qp, local_addr, remote_addr, length); // no CPU copy
Brian Wilson (GT1) 7-7-25
Comments
Post a Comment