Mechanisms for efficient intra-die/intra-chip collective messaging
    1.
    发明授权
    Mechanisms for efficient intra-die/intra-chip collective messaging 有权
    有效的片内/片内集体消息传递的机制

    公开(公告)号:US08904118B2

    公开(公告)日:2014-12-02

    申请号:US12986528

    申请日:2011-01-07

    CPC分类号: G06F12/0831 G06F15/167

    摘要: Mechanism of efficient intra-die collective processing across the nodelets with separate shared memory coherency domains is provided. An integrated circuit die may include a hardware collective unit implemented on the integrated circuit die. A plurality of cores on the integrated circuit die is grouped into a plurality of shared memory coherence domains. Each of the plurality of shared memory coherence domains is connected to the collective unit for performing collective operations between the plurality of shared memory coherence domains.

    摘要翻译: 提供了具有单独的共享存储器一致性域的节点之间的有效模内集体处理的机制。 集成电路管芯可以包括在集成电路管芯上实现的硬件集合单元。 集成电路管芯上的多个核被分组成多个共享存储器相干域。 多个共享存储器相干域中的每一个连接到集体单元,用于在多个共享存储器相干域之间执行集合操作。

    MECHANISM FOR OPTIMIZED INTRA-DIE INTER-NODELET MESSAGING COMMUNICATION
    3.
    发明申请
    MECHANISM FOR OPTIMIZED INTRA-DIE INTER-NODELET MESSAGING COMMUNICATION 有权
    优化内部信号通信通信机制

    公开(公告)号:US20130326180A1

    公开(公告)日:2013-12-05

    申请号:US13485074

    申请日:2012-05-31

    IPC分类号: G06F12/14

    CPC分类号: G06F9/544 G06F15/167

    摘要: Point-to-point intra-nodelet messaging support for nodelets on a single chip that obey MPI semantics may be provided. In one aspect, a local buffering mechanism is employed that obeys standard communication protocols for the network communications between the nodelets integrated in a single chip. Sending messages from one nodelet to another nodelet on the same chip may be performed not via the network, but by exchanging messages in the point-to-point messaging buckets between the nodelets. The messaging buckets need not be part of the memory system of the nodelets. Specialized hardware controllers may be used for moving data between the nodelets and each messaging bucket, and ensuring correct operation of the network protocol.

    摘要翻译: 可以提供在遵循MPI语义的单个芯片上的节点的点对点节点内消息支持。 在一个方面,采用本地缓冲机制,其遵循集成在单个芯片中的节点之间的网络通信的标准通信协议。 从同一芯片上的一个节点发送消息到另一个节点可能不是通过网络执行的,而是通过在节点之间的点对点消息存储区中交换消息。 消息传递桶不需要是节点的内存系统的一部分。 专用硬件控制器可用于在节点和每个消息传送桶之间移动数据,并确保网络协议的正确操作。

    Mechanism for optimized intra-die inter-nodelet messaging communication
    5.
    发明授权
    Mechanism for optimized intra-die inter-nodelet messaging communication 有权
    机构优化模块间节点间消息传递通信

    公开(公告)号:US08943516B2

    公开(公告)日:2015-01-27

    申请号:US13485074

    申请日:2012-05-31

    CPC分类号: G06F9/544 G06F15/167

    摘要: Point-to-point intra-nodelet messaging support for nodelets on a single chip that obey MPI semantics may be provided. In one aspect, a local buffering mechanism is employed that obeys standard communication protocols for the network communications between the nodelets integrated in a single chip. Sending messages from one nodelet to another nodelet on the same chip may be performed not via the network, but by exchanging messages in the point-to-point messaging buckets between the nodelets. The messaging buckets need not be part of the memory system of the nodelets. Specialized hardware controllers may be used for moving data between the nodelets and each messaging bucket, and ensuring correct operation of the network protocol.

    摘要翻译: 可以提供在遵循MPI语义的单个芯片上的节点的点对点节点内消息支持。 在一个方面,采用本地缓冲机制,其遵循集成在单个芯片中的节点之间的网络通信的标准通信协议。 从同一芯片上的一个节点发送消息到另一个节点可能不是通过网络执行的,而是通过在节点之间的点对点消息存储区中交换消息。 消息传递桶不需要是节点的内存系统的一部分。 专用硬件控制器可用于在节点和每个消息传送桶之间移动数据,并确保网络协议的正确操作。

    MECHANISMS FOR EFFICIENT INTRA-DIE/INTRA-CHIP COLLECTIVE MESSAGING
    6.
    发明申请
    MECHANISMS FOR EFFICIENT INTRA-DIE/INTRA-CHIP COLLECTIVE MESSAGING 审中-公开
    有效的内部/内部集体消息传递的机制

    公开(公告)号:US20130007378A1

    公开(公告)日:2013-01-03

    申请号:US13611985

    申请日:2012-09-12

    IPC分类号: G06F12/00

    CPC分类号: G06F12/0831 G06F15/167

    摘要: Mechanism of efficient intra-die collective processing across the nodelets with separate shared memory coherency domains is provided. An integrated circuit die may include a hardware collective unit implemented on the integrated circuit die. A plurality of cores on the integrated circuit die is grouped into a plurality of shared memory coherence domains. Each of the plurality of shared memory coherence domains is connected to the collective unit for performing collective operations between the plurality of shared memory coherence domains.

    摘要翻译: 提供了具有单独的共享存储器一致性域的节点之间的有效模内集体处理的机制。 集成电路管芯可以包括在集成电路管芯上实现的硬件集合单元。 集成电路管芯上的多个核被分组成多个共享存储器相干域。 多个共享存储器相干域中的每一个连接到集体单元,用于在多个共享存储器相干域之间执行集合操作。

    Mechanisms for efficient intra-die/intra-chip collective messaging
    7.
    发明授权
    Mechanisms for efficient intra-die/intra-chip collective messaging 有权
    有效的片内/片内集体消息传递的机制

    公开(公告)号:US08990514B2

    公开(公告)日:2015-03-24

    申请号:US13611985

    申请日:2012-09-12

    CPC分类号: G06F12/0831 G06F15/167

    摘要: Mechanism of efficient intra-die collective processing across the nodelets with separate shared memory coherency domains is provided. An integrated circuit die may include a hardware collective unit implemented on the integrated circuit die. A plurality of cores on the integrated circuit die is grouped into a plurality of shared memory coherence domains. Each of the plurality of shared memory coherence domains is connected to the collective unit for performing collective operations between the plurality of shared memory coherence domains.

    摘要翻译: 提供了具有单独的共享存储器一致性域的节点之间的有效模内集体处理的机制。 集成电路管芯可以包括在集成电路管芯上实现的硬件集合单元。 集成电路管芯上的多个核被分组成多个共享存储器相干域。 多个共享存储器相干域中的每一个连接到集体单元,用于在多个共享存储器相干域之间执行集合操作。

    MECHANISMS FOR EFFICIENT INTRA-DIE/INTRA-CHIP COLLECTIVE MESSAGING
    8.
    发明申请
    MECHANISMS FOR EFFICIENT INTRA-DIE/INTRA-CHIP COLLECTIVE MESSAGING 有权
    有效的内部/内部集体消息传递的机制

    公开(公告)号:US20120179879A1

    公开(公告)日:2012-07-12

    申请号:US12986528

    申请日:2011-01-07

    IPC分类号: G06F12/00

    CPC分类号: G06F12/0831 G06F15/167

    摘要: Mechanism of efficient intra-die collective processing across the nodelets with separate shared memory coherency domains is provided. An integrated circuit die may include a hardware collective unit implemented on the integrated circuit die. A plurality of cores on the integrated circuit die is grouped into a plurality of shared memory coherence domains. Each of the plurality of shared memory coherence domains is connected to the collective unit for performing collective operations between the plurality of shared memory coherence domains.

    摘要翻译: 提供了具有单独的共享存储器一致性域的节点之间的有效模内集体处理的机制。 集成电路管芯可以包括在集成电路管芯上实现的硬件集合单元。 集成电路管芯上的多个核被分组成多个共享存储器相干域。 多个共享存储器相干域中的每一个连接到集体单元,用于在多个共享存储器相干域之间执行集合操作。

    Using DMA for copying performance counter data to memory
    9.
    发明授权
    Using DMA for copying performance counter data to memory 失效
    使用DMA将性能计数器数据复制到存储器

    公开(公告)号:US08621167B2

    公开(公告)日:2013-12-31

    申请号:US13446467

    申请日:2012-04-13

    IPC分类号: G06F12/00

    摘要: A device for copying performance counter data includes hardware path that connects a direct memory access (DMA) unit to a plurality of hardware performance counters and a memory device. Software prepares an injection packet for the DMA unit to perform copying, while the software can perform other tasks. In one aspect, the software that prepares the injection packet runs on a processing core other than the core that gathers the hardware performance counter data.

    摘要翻译: 用于复制性能计数器数据的设备包括将直接存储器访问(DMA)单元连接到多个硬件性能计数器和存储器设备的硬件路径。 软件为DMA单元准备一个注入数据包来执行复制,而软件可以执行其他任务。 在一个方面,准备注射分组的软件在收集硬件性能计数器数据的核心以外的处理核上运行。

    METHOD AND APPARATUS FOR A HIERARCHICAL SYNCHRONIZATION BARRIER IN A MULTI-NODE SYSTEM
    10.
    发明申请
    METHOD AND APPARATUS FOR A HIERARCHICAL SYNCHRONIZATION BARRIER IN A MULTI-NODE SYSTEM 审中-公开
    多节点系统中分层同步障碍的方法与装置

    公开(公告)号:US20120179896A1

    公开(公告)日:2012-07-12

    申请号:US12987523

    申请日:2011-01-10

    IPC分类号: G06F9/30

    摘要: A hierarchical barrier synchronization of cores and nodes on a multiprocessor system, in one aspect, may include providing by each of a plurality of threads on a chip, input bit signal to a respective bit in a register, in response to reaching a barrier; determining whether all of the plurality of threads reached the barrier by electrically tying bits of the register together and “AND”ing the input bit signals; determining whether only on-chip synchronization is needed or whether inter-node synchronization is needed; in response to determining that all of the plurality of threads on the chip reached the barrier, notifying the plurality of threads on the chip, if it is determined that only on-chip synchronization is needed; and after all of the plurality of threads on the chip reached the barrier, communicating the synchronization signal to outside of the chip, if it is determined that inter-node synchronization is needed.

    摘要翻译: 在一个方面,多处理器系统上的核心和节点的层级屏障同步可以包括:响应于达到屏障,将芯片上的多个线程中的每一个提供给寄存器中的相应位的输入比特信号; 确定所有多个线程是否通过将所述寄存器的位电一体化并将所述输入位信号“AND”到达所述障碍物; 确定是否仅需要片上同步或者是否需要节点间同步; 响应于确定芯片上的所有多个线程到达屏障,通知芯片上的多个线程,如果确定仅需要片上同步; 并且如果确定需要节点间同步,则在芯片上的所有多个线程到达屏障之后,将同步信号传送到芯片外部。