Executing a gather operation on a parallel computer
    71.
    发明授权
    Executing a gather operation on a parallel computer 有权
    在并行计算机上执行收集操作

    公开(公告)号:US08140826B2

    公开(公告)日:2012-03-20

    申请号:US11754740

    申请日:2007-05-29

    IPC分类号: G06F15/00 G06F15/76 G06F11/00

    CPC分类号: G06F15/17318

    摘要: Methods, apparatus, and computer program products are disclosed for executing a gather operation on a parallel computer according to embodiments of the present invention. Embodiments include configuring, by the logical root, a result buffer or the logical root, the result buffer having positions, each position corresponding to a ranked node in the operational group and for storing contribution data gathered from that ranked node. Embodiments also include repeatedly for each position in the result buffer: determining, by each compute node of an operational group, whether the current position in the result buffer corresponds with the rank of the compute node, if the current position in the result buffer corresponds with the rank of the compute node, contributing, by that compute node, the compute node's contribution data, if the current position in the result buffer does not correspond with the rank of the compute node, contributing, by that compute node, a value of zero for the contribution data, and storing, by the logical root in the current position in the result buffer, results of a bitwise OR operation of all the contribution data by all compute nodes of the operational group for the current position, the results received through the global combining network.

    摘要翻译: 公开了根据本发明的实施例的用于在并行计算机上执行收集操作的方法,装置和计算机程序产品。 实施例包括通过逻辑根配置结果缓冲器或逻辑根,结果缓冲器具有位置,每个位置对应于操作组中的排序节点,并且用于存储从该排序节点收集的贡献数据。 实施例还包括对结果缓冲器中的每个位置重复执行:由操作组的每个计算节点确定结果缓冲器中的当前位置是否对应于计算节点的等级,如果结果缓冲器中的当前位置对应于 如果结果缓冲器中的当前位置与计算节点的等级不匹配,该计算节点的计算节点的等级由该计算节点贡献,则计算节点的贡献数据对于该计算节点的贡献值为零 对于贡献数据,并且通过结果缓冲器中的当前位置的逻辑根存储由当前位置的操作组的所有计算节点的所有贡献数据的按位或运算的结果,通过 全球组合网络。

    Performing A Deterministic Reduction Operation In A Parallel Computer
    72.
    发明申请
    Performing A Deterministic Reduction Operation In A Parallel Computer 失效
    在并行计算机中执行确定性减少操作

    公开(公告)号:US20110296139A1

    公开(公告)日:2011-12-01

    申请号:US12790037

    申请日:2010-05-28

    IPC分类号: G06F9/30 G06F9/02 G06F15/76

    CPC分类号: G06F15/76 G06F15/17318

    摘要: Performing a deterministic reduction operation in a parallel computer that includes compute nodes, each of which includes computer processors and a CAU (Collectives Acceleration Unit) that couples computer processors to one another for data communications, including organizing processors and a CAU into a branched tree topology in which the CAU is a root and the processors are children; receiving, from each of the processors in any order, dummy contribution data, where each processor is restricted from sending any other data to the root CAU prior to receiving an acknowledgement of receipt from the root CAU; sending, by the root CAU to the processors in the branched tree topology, in a predefined order, acknowledgements of receipt of the dummy contribution data; receiving, by the root CAU from the processors in the predefined order, the processors' contribution data to the reduction operation; and reducing, by the root CAU, the processors' contribution data.

    摘要翻译: 在包括计算节点的并行计算机中执行确定性简化操作,每个节点包括计算机处理器和将计算机处理器彼此耦合以用于数据通信的CAU(集体加速单元),包括将处理器和CAU组织成分支树形拓扑 其中CAU是根,处理器是孩子; 从每个处理器以任何顺序接收虚拟贡献数据,其中每个处理器在从根CAU接收到接收确认之前被限制不发送任何其他数据到根CAU; 由根CAU以分支树拓扑结构向处理器发送预定义的顺序,接收虚拟贡献数据的确认; 根据CAU从预定义的顺序从处理器接收处理器对减少操作的贡献数据; 并由根CAU减少处理器的贡献数据。

    Performing A Deterministic Reduction Operation In A Parallel Computer
    73.
    发明申请
    Performing A Deterministic Reduction Operation In A Parallel Computer 有权
    在并行计算机中执行确定性减少操作

    公开(公告)号:US20110296137A1

    公开(公告)日:2011-12-01

    申请号:US12789986

    申请日:2010-05-28

    IPC分类号: G06F15/76 G06F15/80 G06F9/02

    CPC分类号: G06F15/17318

    摘要: A parallel computer that includes compute nodes having computer processors and a CAU (Collectives Acceleration Unit) that couples processors to one another for data communications. In embodiments of the present invention, deterministic reduction operation include: organizing processors of the parallel computer and a CAU into a branched tree topology, where the CAU is a root of the branched tree topology and the processors are children of the root CAU; establishing a receive buffer that includes receive elements associated with processors and configured to store the associated processor's contribution data; receiving, in any order from the processors, each processor's contribution data; tracking receipt of each processor's contribution data; and reducing, the contribution data in a predefined order, only after receipt of contribution data from all processors in the branched tree topology.

    摘要翻译: 包括具有计算机处理器的计算节点和将处理器彼此耦合用于数据通信的CAU(集体加速单元)的并行计算机。 在本发明的实施例中,确定性减少操作包括:将并行计算机和CAU的处理器组织成分支树形拓扑,其中CAU是分支树形拓扑的根,并且处理器是根CAU的子节点; 建立接收缓冲器,其包括与处理器相关联的接收元件,并被配置为存储相关联的处理器的贡献数据; 以处理器的任何顺序接收每个处理器的贡献数据; 跟踪收到每个处理器的贡献数据; 并且仅在从分支树拓扑中的所有处理器接收到贡献数据之后,以预定义的顺序减少贡献数据。

    Effecting Hardware Acceleration Of Broadcast Operations In A Parallel Computer
    74.
    发明申请
    Effecting Hardware Acceleration Of Broadcast Operations In A Parallel Computer 有权
    影响并行计算机中广播操作的硬件加速

    公开(公告)号:US20110289177A1

    公开(公告)日:2011-11-24

    申请号:US12782791

    申请日:2010-05-19

    IPC分类号: G06F15/173

    CPC分类号: G06F15/17318

    摘要: Compute nodes of a parallel computer organized for collective operations via a network, each compute node having a receive buffer and establishing a topology for the network; selecting a schedule for a broadcast operation; depositing, by a root node of the topology, broadcast data in a target node's receive buffer, including performing a DMA operation with a well-known memory location for the target node's receive buffer; depositing, by the root node in a memory region designated for storing broadcast data length, a length of the broadcast data, including performing a DMA operation with a well-known memory location of the broadcast data length memory region; and triggering, by the root node, the target node to perform a next DMA operation, including depositing, in a memory region designated for receiving injection instructions for the target node, an instruction to inject the broadcast data into the receive buffer of a subsequent target node.

    摘要翻译: 计算经由网络组织用于集体操作的并行计算机的节点,每个计算节点具有接收缓冲器并为网络建立拓扑; 选择广播操作的时间表; 通过拓扑的根节点存储在目标节点的接收缓冲器中的广播数据,包括用目标节点的接收缓冲器的公知存储器位置执行DMA操作; 通过根节点在指定用于存储广播数据长度的存储器区域中存储广播数据的长度,包括利用广播数据长度存储区域的公知存储器位置执行DMA操作; 并且由根节点触发目标节点执行下一个DMA操作,包括在指定用于接收目标节点的注入指令的存储器区域中存储将广播数据注入到后续目标的接收缓冲器中的指令 节点。

    Tracking network contention
    75.
    发明授权
    Tracking network contention 失效
    跟踪网络争用

    公开(公告)号:US08055879B2

    公开(公告)日:2011-11-08

    申请号:US11955474

    申请日:2007-12-13

    IPC分类号: H04L12/28 G06F15/16 G06F15/00

    CPC分类号: H04L43/0882 H04L43/022

    摘要: Methods, apparatus, and product for tracking network contention on links among compute nodes of an operational group in a point-to-point data communications network of a parallel computer are disclosed. In embodiments of the present invention, each compute node is connected to an adjacent compute node in the point-to-point data communications network through a link. Tracking network contention according to embodiments of the present invention includes maintaining, by a network contention module on each compute node in the operational group, a local contention counter for each compute node, each local contention counter representing network contention on links among the compute nodes originating from the compute node; and maintaining a global contention counter, the global contention counter representing network contention currently on all links among the compute nodes in the operational group.

    摘要翻译: 公开了用于跟踪并行计算机的点对点数据通信网络中的操作组的计算节点之间的链路上的网络争用的方法,装置和产品。 在本发明的实施例中,每个计算节点通过链路连接到点对点数据通信网络中的相邻计算节点。 根据本发明的实施例的跟踪网络争用包括通过操作组中的每个计算节点上的网络争用模块维护每个计算节点的本地争用计数器,每个本地争用计数器表示网络对源于计算节点之间的链路的争用 从计算节点; 并维护全球争用计数器,全球争用计数器表示当前在操作组中的计算节点之间的所有链路上的网络争用。

    Providing policy-based operating system services in a hypervisor on a computing system
    76.
    发明授权
    Providing policy-based operating system services in a hypervisor on a computing system 有权
    在计算系统的管理程序中提供基于策略的操作系统服务

    公开(公告)号:US08032899B2

    公开(公告)日:2011-10-04

    申请号:US11553077

    申请日:2006-10-26

    IPC分类号: G06F15/163

    CPC分类号: G06F9/5055 G06F9/5077

    摘要: Methods, apparatus, and products are disclosed for providing policy-based operating system services in a hypervisor on a computing system. The computing system includes at least one compute node. The compute node includes an operating system and a hypervisor. The operating system includes a kernel. The hypervisor comprising a kernel proxy and a plurality of operating system services of a service type. Providing policy-based operating system services in a hypervisor on a computing system includes establishing, on the compute node, a kernel policy specifying one of the operating system services of the service type for use by the kernel proxy, and accessing, by the kernel proxy, the specified operating system service. The computing system may also be implemented as a distributed computing system that includes one or more operating system service nodes. One or more of the operating system services may be distributed among the operating system service nodes.

    摘要翻译: 公开了用于在计算系统上的管理程序中提供基于策略的操作系统服务的方法,装置和产品。 计算系统包括至少一个计算节点。 计算节点包括操作系统和管理程序。 操作系统包括内核。 该管理程序包括内核代理和服务类型的多个操作系统服务。 在计算系统的管理程序中提供基于策略的操作系统服务包括在计算节点上建立指定服务类型的操作系统服务之一以供内核代理使用的内核策略,以及由内核代理 ,指定的操作系统服务。 计算系统还可以被实现为包括一个或多个操作系统服务节点的分布式计算系统。 一个或多个操作系统服务可以分布在操作系统服务节点之间。

    Direct Injection of Data To Be Transferred In A Hybrid Computing Environment
    77.
    发明申请
    Direct Injection of Data To Be Transferred In A Hybrid Computing Environment 失效
    直接注入要在混合计算环境中传输的数据

    公开(公告)号:US20110239003A1

    公开(公告)日:2011-09-29

    申请号:US12748559

    申请日:2010-03-29

    摘要: Direct injection of a data to be transferred in a hybrid computing environment that includes a host computer and a plurality of accelerators, the host computer and the accelerators adapted to one another for data communications by a system level message passing module. Each accelerator includes a Power Processing Element (‘PPE’) and a plurality of Synergistic Processing Elements (‘SPEs’). Direct injection includes reserving, by each SPE, a slot in a shared memory region accessible by the host computer; loading, by each SPE into local memory of the SPE, a portion of data to be transferred to the host computer; executing, by each SPE in parallel, a data processing operation on the portion of the data loaded in local memory of each SPE; and writing, by each SPE, the processed data to the SPE's reserved slot in the shared memory region accessible by the host computer.

    摘要翻译: 在包括主计算机和多个加速器的混合计算环境中直接注入要传送的数据,所述主计算机和加速器彼此适配用于由系统级消息传递模块进行数据通信。 每个加速器包括功率处理元件(“PPE”)和多个协同处理元件(“SPE”)。 直接注入包括由每个SPE保留由主计算机可访问的共享存储器区域中的时隙; 将每个SPE加载到SPE的本地存储器中,将要传送到主机的一部分数据; 由每个SPE并行执行对每个SPE的本地存储器中加载的数据的部分的数据处理操作; 并且由每个SPE将处理的数据写入SPE主机计算机可访问的共享存储器区域中的保留时隙。

    Performing A Scatterv Operation On A Hierarchical Tree Network Optimized For Collective Operations
    78.
    发明申请
    Performing A Scatterv Operation On A Hierarchical Tree Network Optimized For Collective Operations 失效
    在集体行动优化的分层树网络上执行散度图操作

    公开(公告)号:US20110238950A1

    公开(公告)日:2011-09-29

    申请号:US12748594

    申请日:2010-03-29

    IPC分类号: G06F15/76 G06F9/06

    CPC分类号: G06F15/17318

    摘要: Performing a scattery operation on a hierarchical tree network optimized for collective operations including receiving, by the scattery module installed on the node, from a nearest neighbor parent above the node a chunk of data having at least a portion of data for the node; maintaining, by the scattery module installed on the node, the portion of the data for the node; determining, by the scattery module installed on the node, whether any portions of the data are for a particular nearest neighbor child below the node or one or more other nodes below the particular nearest neighbor child; and sending, by the scattery module installed on the node, those portions of data to the nearest neighbor child if any portions of the data are for a particular nearest neighbor child below the node or one or more other nodes below the particular nearest neighbor child.

    摘要翻译: 在针对集体操作优化的分级树网络上执行分散操作,包括由节点上安装的散布模块从节点上方的最近邻居父节点接收具有该节点的至少一部分数据的数据块; 通过安装在节点上的分散模块维护节点的数据部分; 通过安装在节点上的分散模块来确定数据的任何部分是否用于节点下方的特定最近邻孩子或特定最邻近子节点下方的一个或多个其他节点; 并且如果数据的任何部分用于节点下方的特定最近邻孩子或特定最近邻孩子下方的一个或多个其他节点,则将安装在节点上的散布模块发送到最近邻孩子的那些部分。

    Dispatching packets on a global combining network of a parallel computer
    79.
    发明授权
    Dispatching packets on a global combining network of a parallel computer 失效
    在并行计算机的全局组合网络上调度数据包

    公开(公告)号:US07984450B2

    公开(公告)日:2011-07-19

    申请号:US11946136

    申请日:2007-11-28

    IPC分类号: G06F13/00

    CPC分类号: G06F13/387

    摘要: Methods, apparatus, and products are disclosed for dispatching packets on a global combining network of a parallel computer comprising a plurality of nodes connected for data communications using the network capable of performing collective operations and point to point operations that include: receiving, by an origin system messaging module on an origin node from an origin application messaging module on the origin node, a storage identifier and an operation identifier, the storage identifier specifying storage containing an application message for transmission to a target node, and the operation identifier specifying a message passing operation; packetizing, by the origin system messaging module, the application message into network packets for transmission to the target node, each network packet specifying the operation identifier and an operation type for the message passing operation specified by the operation identifier; and transmitting, by the origin system messaging module, the network packets to the target node.

    摘要翻译: 公开了用于在并行计算机的全局组合网络上分发分组的方法,装置和产品,所述并行计算机包括使用能够执行集合操作的网络连接的数据通信的多个节点和点对点操作,所述多个节点包括: 来自源节点上的原始应用消息模块的源节点上的系统消息模块,存储标识符和操作标识符,存储标识符指定存储器,其包含用于传输到目标节点的应用消息,以及指定消息传递的操作标识符 操作; 由原始系统消息传递模块将应用消息分组到网络分组中以传输到目标节点,每个网络分组指定操作标识符和由操作标识符指定的消息传递操作的操作类型; 并且由原始系统消息传递模块将网络分组发送到目标节点。

    Direct memory access transfer completion notification
    80.
    发明授权
    Direct memory access transfer completion notification 有权
    直接内存访问传输完成通知

    公开(公告)号:US07890670B2

    公开(公告)日:2011-02-15

    申请号:US11746348

    申请日:2007-05-09

    IPC分类号: G06F13/28 G06F12/00

    CPC分类号: G06F13/28

    摘要: DMA transfer completion notification includes: inserting, by an origin DMA engine on an origin node in an injection first-in-first-out (‘FIFO’) buffer, a data descriptor for an application message to be transferred to a target node on behalf of an application on the origin node; inserting, by the origin DMA engine, a completion notification descriptor in the injection FIFO buffer after the data descriptor for the message, the completion notification descriptor specifying a packet header for a completion notification packet; transferring, by the origin DMA engine to the target node, the message in dependence upon the data descriptor; sending, by the origin DMA engine, the completion notification packet to a local reception FIFO buffer using a local memory FIFO transfer operation; and notifying, by the origin DMA engine, the application that transfer of the message is complete in response to receiving the completion notification packet in the local reception FIFO buffer.

    摘要翻译: DMA传输完成通知包括:由原始DMA引擎插入先进先出(“FIFO”)缓冲器中的原始节点,代表要传送到目标节点的应用消息的数据描述符 原始节点上的应用程序; 由原始DMA引擎在消息的数据描述符之后插入注入FIFO缓冲器中的完成通知描述符,完成通知描述符指定完成通知包的包头; 根据数据描述符将原始DMA引擎传送到目标节点消息; 通过原始DMA引擎,使用本地存储器FIFO传送操作将完成通知包发送到本地接收FIFO缓冲器; 并且由原始DMA引擎通知响应于在本地接收FIFO缓冲器中接收到完成通知分组来完成该消息的传送的应用程序。