摘要:
Methods, apparatus and systems for facilitating explicit flow control for RDMA transfers using implicit memory registration. To setup an RDMA data transfer, a source RNIC sends a request to allocate a destination buffer at a destination RNIC using implicit memory registration. Under implicit memory registration, the page or pages to be registered are not explicitly identified by the source RNIC, and may correspond to pages that are paged out to virtual memory. As a result, registration of such pages result in page faults, leading to a page fault delay before registration and pinning of the pages is completed. In response to detection of a page fault, the destination RNIC returns an acknowledgment indicating that a page fault delay is occurring. In response to receiving the acknowledgment, the source RNIC temporarily stops sending packets, and does not retransmit packets for which ACKs are not received prior to retransmission timeout expiration.
摘要:
Methods, apparatus, and software for optimizing network data flows within constrained systems. The methods enable data to be transferred between PCIe cards in multi-socket server platforms, each platform including a local socket having an InfiniBand (IB) HCA and a remote socket. Data to be transmitted outbound from a platform is transferred from a PCIe card to the platform's IB HCA via a proxied datapath. Data received at a platform may employ a direct PCIe peer-to-peer (P2P) transfer if the destined PCIe card is installed in the local socket or via a proxied datapath if the destined PCIe card is installed in a remote socket. Outbound transfers from a PCIe card in a local socket to the platform's IB HCA may selectively be transferred using an either a proxied data path for larger data transfers or a direct P2P datapath for smaller data transfers. The software is configured to support each of local-local, remote-local, local-remote, and remote-remote data transfers in a manner that is transparent to the software applications generating and receiving the data.
摘要:
Methods, apparatus, and software for optimizing network data flows within constrained systems. The methods enable data to be transferred between PCIe cards in multi-socket server platforms, each platform including a local socket having an InfiniBand (IB) HCA and a remote socket. Data to be transmitted outbound from a platform is transferred from a PCIe card to the platform's IB HCA via a proxied datapath. Data received at a platform may employ a direct PCIe peer-to-peer (P2P) transfer if the destined PCIe card is installed in the local socket or via a proxied datapath if the destined PCIe card is installed in a remote socket. Outbound transfers from a PCIe card in a local socket to the platform's IB HCA may selectively be transferred using an either a proxied data path for larger data transfers or a direct P2P datapath for smaller data transfers. The software is configured to support each of local-local, remote-local, local-remote, and remote-remote data transfers in a manner that is transparent to the software applications generating and receiving the data.
摘要:
Embodiments of the invention describe systems, apparatuses and methods that enable sharing Remote Direct Memory Access (RDMA) device hardware between a host and a peripheral device including a CPU and memory complex (alternatively referred to herein as a processor add-in card). Embodiments of the invention utilize interconnect hardware such as Peripheral Component Interconnect express (PCIe) hardware for peer-to-peer data transfers between processor add-in cards and RDMA devices. A host system may include modules or logic to map memory and registers to and/or from the RDMA device, thereby enabling I/O to be performed directly to and from user-mode applications on the processor add-in card, concurrently with host system I/O operations.
摘要翻译:本发明的实施例描述了能够在主机和包括CPU和存储器复合体(在本文中称为处理器附加卡)的外围设备之间共享远程直接存储器访问(RDMA)设备硬件的系统,装置和方法。 本发明的实施例利用互连硬件,例如用于处理器附加卡和RDMA设备之间的对等数据传输的外围组件互连快速(PCIe)硬件。 主机系统可以包括将存储器映射到RDMA设备和/或来自RDMA设备的寄存器的模块或逻辑,从而使I / O能够直接从处理器附加卡上的用户模式应用执行,与主机系统 I / O操作。
摘要:
Embodiments of the invention describe systems, apparatuses and methods that enable sharing Remote Direct Memory Access (RDMA) device hardware between a host and a peripheral device including a CPU and memory complex (alternatively referred to herein as a processor add-in card). Embodiments of the invention utilize interconnect hardware such as Peripheral Component Interconnect express (PCIe) hardware for peer-to-peer data transfers between processor add-in cards and RDMA devices. A host system may include modules or logic to map memory and registers to and/or from the RDMA device, thereby enabling I/O to be performed directly to and from user-mode applications on the processor add-in card, concurrently with host system I/O operations.
摘要翻译:本发明的实施例描述了能够在主机和包括CPU和存储器复合体(在本文中称为处理器附加卡)的外围设备之间共享远程直接存储器访问(RDMA)设备硬件的系统,装置和方法。 本发明的实施例利用互连硬件,例如用于处理器附加卡和RDMA设备之间的对等数据传输的外围组件互连快速(PCIe)硬件。 主机系统可以包括将存储器映射到RDMA设备和/或来自RDMA设备的寄存器的模块或逻辑,从而使I / O能够直接从处理器附加卡上的用户模式应用执行,与主机系统 I / O操作。
摘要:
Methods for performing power management of InfiniBand (IB) switches and apparatus and software configured to implement the methods. Power management datagrams (MADs) are used to inform IB switches that host servers connected to the IB switch's ports are to transition to a reduced-power or offline state or have returned to a normal operating state. A subnet management agent (SMA) on the IB switch receives the power MADs from the host servers and tracks each server's operating state. In response to power down MADs, the SMA coordinates power reduction of the switch's ports and other switch circuitry. For switches with multi-port IB interfaces, a multi-port interface is caused to enter a reduced-power state when all of its ports are connected to host servers that are idle or offline. Additionally, when all of a switch's ports are connected to idle or offline servers the SMA may put the switch's core switch logic into a reduced-power state. Power MADs are also used to inform upstream IB switches when a switch is to transition to a reduced power state or has returned to a normal operating state.
摘要:
In an embodiment, a method is provided. In an embodiment, the method provides determining that a message has been placed in a send buffer; and transferring the message to an application on a second virtual machine by bypassing use of an operating system to process the message by directly placing the message in an application memory space from which the application can retrieve the message.
摘要:
In an embodiment, a method is provided. In an embodiment, the method provides determining that a message has been placed in a send buffer; and transferring the message to an application on a second virtual machine by bypassing use of an operating system to process the message by directly placing the message in an application memory space from which the application can retrieve the message.
摘要:
A cluster operating in accordance with an integrating operating system independent power management with operating system directed power management includes a group of hosts connected together by a cluster interconnection fabric. A cluster administrator is connected to the group of hosts via the fabric and the cluster administrator includes a cluster power manager. A group of input/output units are connected to the group of hosts and the cluster interconnection fabric. Each of the hosts includes a controller element and an operating system power manager and input/output controller device driver stack. The cluster administrator transmits a request to the controller element of one of the hosts via the fabric and receives a reply therefrom and transmits a command. The controller element transmits the command to the operating system power manager and the input/output controller device driver stack of its host and transmits a command completion acknowledgment to the cluster power manager. The technique allows a cluster administrator to power manager fabric attached hosts and input/output controllers regardless of which host currently owns the controller.
摘要:
Methods and apparatus to provide peer-to-peer interrupt signaling between devices coupled via one or more interconnects are described. In one embodiment, a NIC (Network Interface Card such as a Remote Direct Memory Access (RDMA) capable NIC) transfers data directly into or out of the memory of a peer device that is coupled to the NIC via one or more interconnects, bypassing a host computing/processing unit and/or main system memory. Other embodiments are also disclosed.