EXPLICIT FLOW CONTROL FOR IMPLICIT MEMORY REGISTRATION
    1.
    发明申请
    EXPLICIT FLOW CONTROL FOR IMPLICIT MEMORY REGISTRATION 有权
    明确记忆注册的明示流程控制

    公开(公告)号:US20140164545A1

    公开(公告)日:2014-06-12

    申请号:US13711122

    申请日:2012-12-11

    IPC分类号: G06F13/28

    CPC分类号: G06F13/28

    摘要: Methods, apparatus and systems for facilitating explicit flow control for RDMA transfers using implicit memory registration. To setup an RDMA data transfer, a source RNIC sends a request to allocate a destination buffer at a destination RNIC using implicit memory registration. Under implicit memory registration, the page or pages to be registered are not explicitly identified by the source RNIC, and may correspond to pages that are paged out to virtual memory. As a result, registration of such pages result in page faults, leading to a page fault delay before registration and pinning of the pages is completed. In response to detection of a page fault, the destination RNIC returns an acknowledgment indicating that a page fault delay is occurring. In response to receiving the acknowledgment, the source RNIC temporarily stops sending packets, and does not retransmit packets for which ACKs are not received prior to retransmission timeout expiration.

    摘要翻译: 用于使用隐式内存注册来促进RDMA传输的显式流控制的方法,装置和系统。 为了建立RDMA数据传输,源RNIC使用隐式存储器注册发送在目的地RNIC处分配目的地缓冲器的请求。 在隐式内存注册下,要注册的页面或源将不被源RNIC明确标识,并且可能对应于分页到虚拟内存的页面。 因此,这些页面的注册导致页面错误,导致页面错误延迟,然后注册和页面的固定完成。 响应于页错误的检测,目的地RNIC返回指示页错误延迟发生的确认。 响应于接收到确认,源RNIC临时停止发送分组,并且在重传超时到期之前不重传未接收到ACK的分组。

    Method to optimize network data flows within a constrained system
    2.
    发明授权
    Method to optimize network data flows within a constrained system 有权
    在受限系统内优化网络数据流的方法

    公开(公告)号:US09558148B2

    公开(公告)日:2017-01-31

    申请号:US14266241

    申请日:2014-04-30

    摘要: Methods, apparatus, and software for optimizing network data flows within constrained systems. The methods enable data to be transferred between PCIe cards in multi-socket server platforms, each platform including a local socket having an InfiniBand (IB) HCA and a remote socket. Data to be transmitted outbound from a platform is transferred from a PCIe card to the platform's IB HCA via a proxied datapath. Data received at a platform may employ a direct PCIe peer-to-peer (P2P) transfer if the destined PCIe card is installed in the local socket or via a proxied datapath if the destined PCIe card is installed in a remote socket. Outbound transfers from a PCIe card in a local socket to the platform's IB HCA may selectively be transferred using an either a proxied data path for larger data transfers or a direct P2P datapath for smaller data transfers. The software is configured to support each of local-local, remote-local, local-remote, and remote-remote data transfers in a manner that is transparent to the software applications generating and receiving the data.

    摘要翻译: 用于优化受限系统内网络数据流的方法,装置和软件。 这些方法使数据能够在多插槽服务器平台中的PCIe卡之间传输,每个平台包括具有InfiniBand(IB)HCA和远程插座的本地插座。 从平台传出的数据通过代理的数据路径从PCIe卡传输到平台的IB HCA。 如果将目的地的PCIe卡安装在本地套接字中,或者如果目的地的PCIe卡安装在远程插座中,则通过代理的数据路径,平台上接收的数据可以采用直接PCIe对等(P2P)传输。 从本地插座的PCIe卡到平台的IB HCA的出站传输可以选择性地使用用于更大数据传输的代理数据路径或用于较小数据传输的直接P2P数据路径传输。 该软件配置为以对生成和接收数据的软件应用程序透明的方式支持本地本地,远程本地,本地远程和远程数据传输。

    METHOD TO OPTIMIZE NETWORK DATA FLOWS WITHIN A CONSTRAINED SYSTEM
    3.
    发明申请
    METHOD TO OPTIMIZE NETWORK DATA FLOWS WITHIN A CONSTRAINED SYSTEM 有权
    优化网络数据流在受约束系统中的方法

    公开(公告)号:US20150317280A1

    公开(公告)日:2015-11-05

    申请号:US14266241

    申请日:2014-04-30

    摘要: Methods, apparatus, and software for optimizing network data flows within constrained systems. The methods enable data to be transferred between PCIe cards in multi-socket server platforms, each platform including a local socket having an InfiniBand (IB) HCA and a remote socket. Data to be transmitted outbound from a platform is transferred from a PCIe card to the platform's IB HCA via a proxied datapath. Data received at a platform may employ a direct PCIe peer-to-peer (P2P) transfer if the destined PCIe card is installed in the local socket or via a proxied datapath if the destined PCIe card is installed in a remote socket. Outbound transfers from a PCIe card in a local socket to the platform's IB HCA may selectively be transferred using an either a proxied data path for larger data transfers or a direct P2P datapath for smaller data transfers. The software is configured to support each of local-local, remote-local, local-remote, and remote-remote data transfers in a manner that is transparent to the software applications generating and receiving the data.

    摘要翻译: 用于优化受限系统内网络数据流的方法,装置和软件。 这些方法使数据能够在多插槽服务器平台中的PCIe卡之间传输,每个平台包括具有InfiniBand(IB)HCA和远程插座的本地插座。 从平台传出的数据通过代理的数据路径从PCIe卡传输到平台的IB HCA。 如果将目的地的PCIe卡安装在本地套接字中,或者如果目的地的PCIe卡安装在远程插座中,则通过代理的数据路径,平台上接收的数据可以采用直接PCIe对等(P2P)传输。 从本地插座的PCIe卡到平台的IB HCA的出站传输可以选择性地使用用于更大数据传输的代理数据路径或用于较小数据传输的直接P2P数据路径传输。 该软件配置为以对生成和接收数据的软件应用程序透明的方式支持本地本地,远程本地,本地远程和远程数据传输。

    Direct I/O access for system co-processors
    4.
    发明授权
    Direct I/O access for system co-processors 有权
    用于系统协处理器的直接I / O访问

    公开(公告)号:US08914556B2

    公开(公告)日:2014-12-16

    申请号:US13997601

    申请日:2011-09-30

    IPC分类号: G06F13/28 G06F13/14 G06F13/16

    CPC分类号: G06F13/16 G06F13/14 G06F13/28

    摘要: Embodiments of the invention describe systems, apparatuses and methods that enable sharing Remote Direct Memory Access (RDMA) device hardware between a host and a peripheral device including a CPU and memory complex (alternatively referred to herein as a processor add-in card). Embodiments of the invention utilize interconnect hardware such as Peripheral Component Interconnect express (PCIe) hardware for peer-to-peer data transfers between processor add-in cards and RDMA devices. A host system may include modules or logic to map memory and registers to and/or from the RDMA device, thereby enabling I/O to be performed directly to and from user-mode applications on the processor add-in card, concurrently with host system I/O operations.

    摘要翻译: 本发明的实施例描述了能够在主机和包括CPU和存储器复合体(在本文中称为处理器附加卡)的外围设备之间共享远程直接存储器访问(RDMA)设备硬件的系统,装置和方法。 本发明的实施例利用互连硬件,例如用于处理器附加卡和RDMA设备之间的对等数据传输的外围组件互连快速(PCIe)硬件。 主机系统可以包括将存储器映射到RDMA设备和/或来自RDMA设备的寄存器的模块或逻辑,从而使I / O能够直接从处理器附加卡上的用户模式应用执行,与主机系统 I / O操作。

    DIRECT I/O ACCESS FOR SYSTEM CO-PROCESSORS
    5.
    发明申请
    DIRECT I/O ACCESS FOR SYSTEM CO-PROCESSORS 有权
    用于系统协处理器的直接I / O访问

    公开(公告)号:US20130275631A1

    公开(公告)日:2013-10-17

    申请号:US13997601

    申请日:2011-09-30

    IPC分类号: G06F13/16

    CPC分类号: G06F13/16 G06F13/14 G06F13/28

    摘要: Embodiments of the invention describe systems, apparatuses and methods that enable sharing Remote Direct Memory Access (RDMA) device hardware between a host and a peripheral device including a CPU and memory complex (alternatively referred to herein as a processor add-in card). Embodiments of the invention utilize interconnect hardware such as Peripheral Component Interconnect express (PCIe) hardware for peer-to-peer data transfers between processor add-in cards and RDMA devices. A host system may include modules or logic to map memory and registers to and/or from the RDMA device, thereby enabling I/O to be performed directly to and from user-mode applications on the processor add-in card, concurrently with host system I/O operations.

    摘要翻译: 本发明的实施例描述了能够在主机和包括CPU和存储器复合体(在本文中称为处理器附加卡)的外围设备之间共享远程直接存储器访问(RDMA)设备硬件的系统,装置和方法。 本发明的实施例利用互连硬件,例如用于处理器附加卡和RDMA设备之间的对等数据传输的外围组件互连快速(PCIe)硬件。 主机系统可以包括将存储器映射到RDMA设备和/或来自RDMA设备的寄存器的模块或逻辑,从而使I / O能够直接从处理器附加卡上的用户模式应用执行,与主机系统 I / O操作。

    POWER MANAGEMENT OF INFINIBAND SWITCHES
    6.
    发明申请
    POWER MANAGEMENT OF INFINIBAND SWITCHES 有权
    INFINIBAND开关的电源管理

    公开(公告)号:US20150338909A1

    公开(公告)日:2015-11-26

    申请号:US14283619

    申请日:2014-05-21

    IPC分类号: G06F1/32

    摘要: Methods for performing power management of InfiniBand (IB) switches and apparatus and software configured to implement the methods. Power management datagrams (MADs) are used to inform IB switches that host servers connected to the IB switch's ports are to transition to a reduced-power or offline state or have returned to a normal operating state. A subnet management agent (SMA) on the IB switch receives the power MADs from the host servers and tracks each server's operating state. In response to power down MADs, the SMA coordinates power reduction of the switch's ports and other switch circuitry. For switches with multi-port IB interfaces, a multi-port interface is caused to enter a reduced-power state when all of its ports are connected to host servers that are idle or offline. Additionally, when all of a switch's ports are connected to idle or offline servers the SMA may put the switch's core switch logic into a reduced-power state. Power MADs are also used to inform upstream IB switches when a switch is to transition to a reduced power state or has returned to a normal operating state.

    摘要翻译: 用于执行InfiniBand(IB)交换机和设备和软件的电源管理的方法,配置为实现该方法。 电源管理数据报(MAD)用于通知IB交换机,连接到IB交换机端口的主机服务器将转换到降低功耗或脱机状态或已恢复正常运行状态。 IB交换机上的子网管理代理(SMA)从主机服务器接收电源MAD,并跟踪每个服务器的运行状态。 为了响应掉电MAD,SMA协调开关端口和其他开关电路的功率降低。 对于具有多端口IB接口的交换机,当其所有端口都连接到空闲或脱机的主机服务器时,会导致多端口接口进入低功耗状态。 此外,当所有交换机的端口连接到空闲或离线服​​务器时,SMA可能将交换机的核心交换机逻辑置于降低功率状态。 当交换机转换到降低功率状态或已经恢复正常工作状态时,电源MAD也用于通知上游IB交换机。

    Cluster power management technique
    9.
    发明授权
    Cluster power management technique 有权
    集群电源管理技术

    公开(公告)号:US06587950B1

    公开(公告)日:2003-07-01

    申请号:US09461729

    申请日:1999-12-16

    IPC分类号: G06F132

    CPC分类号: G06F1/3203

    摘要: A cluster operating in accordance with an integrating operating system independent power management with operating system directed power management includes a group of hosts connected together by a cluster interconnection fabric. A cluster administrator is connected to the group of hosts via the fabric and the cluster administrator includes a cluster power manager. A group of input/output units are connected to the group of hosts and the cluster interconnection fabric. Each of the hosts includes a controller element and an operating system power manager and input/output controller device driver stack. The cluster administrator transmits a request to the controller element of one of the hosts via the fabric and receives a reply therefrom and transmits a command. The controller element transmits the command to the operating system power manager and the input/output controller device driver stack of its host and transmits a command completion acknowledgment to the cluster power manager. The technique allows a cluster administrator to power manager fabric attached hosts and input/output controllers regardless of which host currently owns the controller.

    摘要翻译: 根据具有操作系统定向功率管理的集成操作系统独立的电力管理操作的集群包括通过集群互连结构连接在一起的一组主机。 集群管理员通过架构连接到主机组,集群管理员包括集群电源管理器。 一组输入/输出单元连接到主机组和集群互连结构。 每个主机包括控制器元件和操作系统电源管理器以及输入/输出控制器设备驱动器堆栈。 集群管理员通过该结构向一个主机的控制器元件发送一个请求,并从中接收一个应答并发送命令。 控制器元件将该命令发送到其主机的操作系统电源管理器和输入/输出控制器设备驱动器堆栈,并向集群电源管理器发送命令完成确认。 该技术允许集群管理员为管理器连接的主机和输入/输出控制器供电,而不管当前拥有控制器的主机。

    PEER-TO-PEER INTERRUPT SIGNALING BETWEEN DEVICES COUPLED VIA INTERCONNECTS
    10.
    发明申请
    PEER-TO-PEER INTERRUPT SIGNALING BETWEEN DEVICES COUPLED VIA INTERCONNECTS 有权
    通过互连连接的设备之间的对等中断信号

    公开(公告)号:US20140250202A1

    公开(公告)日:2014-09-04

    申请号:US13997250

    申请日:2012-05-29

    IPC分类号: H04L29/08

    摘要: Methods and apparatus to provide peer-to-peer interrupt signaling between devices coupled via one or more interconnects are described. In one embodiment, a NIC (Network Interface Card such as a Remote Direct Memory Access (RDMA) capable NIC) transfers data directly into or out of the memory of a peer device that is coupled to the NIC via one or more interconnects, bypassing a host computing/processing unit and/or main system memory. Other embodiments are also disclosed.

    摘要翻译: 描述了通过一个或多个互连耦合的设备之间提供对等中断信令的方法和装置。 在一个实施例中,NIC(诸如具有远程直接存储器访问(RDMA)能力的NIC)的网络接口卡将数据直接传送到经由一个或多个互连耦合到NIC的对等设备的存储器中,绕过一个 主机计算/处理单元和/或主系统存储器。 还公开了其他实施例。