摘要:
Methods, apparatus, and software for optimizing network data flows within constrained systems. The methods enable data to be transferred between PCIe cards in multi-socket server platforms, each platform including a local socket having an InfiniBand (IB) HCA and a remote socket. Data to be transmitted outbound from a platform is transferred from a PCIe card to the platform's IB HCA via a proxied datapath. Data received at a platform may employ a direct PCIe peer-to-peer (P2P) transfer if the destined PCIe card is installed in the local socket or via a proxied datapath if the destined PCIe card is installed in a remote socket. Outbound transfers from a PCIe card in a local socket to the platform's IB HCA may selectively be transferred using an either a proxied data path for larger data transfers or a direct P2P datapath for smaller data transfers. The software is configured to support each of local-local, remote-local, local-remote, and remote-remote data transfers in a manner that is transparent to the software applications generating and receiving the data.
摘要:
Methods, apparatus, and software for optimizing network data flows within constrained systems. The methods enable data to be transferred between PCIe cards in multi-socket server platforms, each platform including a local socket having an InfiniBand (IB) HCA and a remote socket. Data to be transmitted outbound from a platform is transferred from a PCIe card to the platform's IB HCA via a proxied datapath. Data received at a platform may employ a direct PCIe peer-to-peer (P2P) transfer if the destined PCIe card is installed in the local socket or via a proxied datapath if the destined PCIe card is installed in a remote socket. Outbound transfers from a PCIe card in a local socket to the platform's IB HCA may selectively be transferred using an either a proxied data path for larger data transfers or a direct P2P datapath for smaller data transfers. The software is configured to support each of local-local, remote-local, local-remote, and remote-remote data transfers in a manner that is transparent to the software applications generating and receiving the data.
摘要:
Embodiments of the invention describe systems, apparatuses and methods that enable sharing Remote Direct Memory Access (RDMA) device hardware between a host and a peripheral device including a CPU and memory complex (alternatively referred to herein as a processor add-in card). Embodiments of the invention utilize interconnect hardware such as Peripheral Component Interconnect express (PCIe) hardware for peer-to-peer data transfers between processor add-in cards and RDMA devices. A host system may include modules or logic to map memory and registers to and/or from the RDMA device, thereby enabling I/O to be performed directly to and from user-mode applications on the processor add-in card, concurrently with host system I/O operations.
摘要翻译:本发明的实施例描述了能够在主机和包括CPU和存储器复合体(在本文中称为处理器附加卡)的外围设备之间共享远程直接存储器访问(RDMA)设备硬件的系统,装置和方法。 本发明的实施例利用互连硬件,例如用于处理器附加卡和RDMA设备之间的对等数据传输的外围组件互连快速(PCIe)硬件。 主机系统可以包括将存储器映射到RDMA设备和/或来自RDMA设备的寄存器的模块或逻辑,从而使I / O能够直接从处理器附加卡上的用户模式应用执行,与主机系统 I / O操作。
摘要:
Embodiments of the invention describe systems, apparatuses and methods that enable sharing Remote Direct Memory Access (RDMA) device hardware between a host and a peripheral device including a CPU and memory complex (alternatively referred to herein as a processor add-in card). Embodiments of the invention utilize interconnect hardware such as Peripheral Component Interconnect express (PCIe) hardware for peer-to-peer data transfers between processor add-in cards and RDMA devices. A host system may include modules or logic to map memory and registers to and/or from the RDMA device, thereby enabling I/O to be performed directly to and from user-mode applications on the processor add-in card, concurrently with host system I/O operations.
摘要翻译:本发明的实施例描述了能够在主机和包括CPU和存储器复合体(在本文中称为处理器附加卡)的外围设备之间共享远程直接存储器访问(RDMA)设备硬件的系统,装置和方法。 本发明的实施例利用互连硬件,例如用于处理器附加卡和RDMA设备之间的对等数据传输的外围组件互连快速(PCIe)硬件。 主机系统可以包括将存储器映射到RDMA设备和/或来自RDMA设备的寄存器的模块或逻辑,从而使I / O能够直接从处理器附加卡上的用户模式应用执行,与主机系统 I / O操作。
摘要:
Methods and apparatus to provide peer-to-peer interrupt signaling between devices coupled via one or more interconnects are described. In one embodiment, a NIC (Network Interface Card such as a Remote Direct Memory Access (RDMA) capable NIC) transfers data directly into or out of the memory of a peer device that is coupled to the NIC via one or more interconnects, bypassing a host computing/processing unit and/or main system memory. Other embodiments are also disclosed.
摘要:
An apparatus and method for efficient input/output processing without the use of interrupts is described. The apparatus includes a plurality of descriptors where each descriptor includes a completion indicator and data associated with an input/output request. The plurality of descriptors includes a head descriptor and a tail descriptor. The apparatus further include a plurality of address holders associated with an input/output processor, and each the plurality of address holders is uniquely affiliated with one of the plurality of descriptors. The apparatus further include a polling mechanism for evaluating the completion indicator of the head descriptor and a completion processor for interfacing with the head descriptor. Finally, the apparatus includes connectors between the tail descriptor and address holder and between the input/output processor and the head descriptor.
摘要:
Methods and apparatus to provide peer-to-peer interrupt signaling between devices coupled via one or more interconnects are described. In one embodiment, a NIC (Network Interface Card such as a Remote Direct Memory Access (RDMA) capable NIC) transfers data directly into or out of the memory of a peer device that is coupled to the NIC via one or more interconnects, bypassing a host computing/processing unit and/or main system memory. Other embodiments are also disclosed.
摘要:
A method and system are provided for transferring data in a networked system between a local memory in a local system and a remote memory in a remote system. A RDMA request is received and a first buffer region is associated with a first transfer operation. The system determines whether a size of the first buffer region exceeds a maximum transfer size of the networked system. Portions of the second buffer region may be associated with the first transfer operation based on the determination of the size of the first buffer region. The system subsequently performs the first transfer operation.
摘要:
Methods and systems for flow control over channel-based switched fabric connections between a first side and a second side. At least one posted receive buffer is stored in a receive buffer queue at the first side. A number of credits is incremented based on the at least one posted receive buffer. The second side is notified of the number of credits. A number of send credits is incremented at the second side based on the number of credits. A message is sent from the second side to the first side if the number of send credits is larger than or equal to two or the number of send credits is equal to one and a second number of credits is larger than or equal to one. The second number of credits is based on at least one second posted receive buffer at the second side. Therefore, communication of messages between the first side and the second side is prevented from deadlocking.
摘要:
A method and system are provided for transferring data in a networked system between a local memory in a local system and a remote memory in a remote system. A RDMA request is received and a first buffer region is associated with a first transfer operation. The system determines whether a size of the first buffer region exceeds a maximum transfer size of the networked system. Portions of the second buffer region may be associated with the first transfer operation based on the determination of the size of the first buffer region. The system subsequently performs the first transfer operation.