Snoop filter for filtering snoop requests
    71.
    发明授权
    Snoop filter for filtering snoop requests 失效
    用于过滤窥探请求的Snoop过滤器

    公开(公告)号:US08255638B2

    公开(公告)日:2012-08-28

    申请号:US12113262

    申请日:2008-05-01

    IPC分类号: G06F12/00 G06F13/00

    摘要: A method and apparatus for supporting cache coherency in a multiprocessor computing environment having multiple processing units, each processing unit having one or more local cache memories associated and operatively connected therewith. The method comprises providing a snoop filter device associated with each processing unit, each snoop filter device having a plurality of dedicated input ports for receiving snoop requests from dedicated memory writing sources in the multiprocessor computing environment. Each snoop filter device includes a plurality of parallel operating port snoop filters in correspondence with the plurality of dedicated input ports, each port snoop filter implementing one or more parallel operating sub-filter elements that are adapted to concurrently filter snoop requests received from respective dedicated memory writing sources and forward a subset of those requests to its associated processing unit.

    摘要翻译: 一种用于在具有多个处理单元的多处理器计算环境中支持高速缓存一致性的方法和装置,每个处理单元具有与其相关联并与之可操作地相连的一个或多个本地高速缓冲存储器。 该方法包括提供与每个处理单元相关联的窥探过滤器设备,每个窥探过滤器设备具有多个专用输入端口,用于从多处理器计算环境中的专用存储器写入源接收窥探请求。 每个窥探过滤器装置包括与多个专用输入端口相对应的多个并行操作端口窥探滤波器,每个端口窥探滤波器实现一个或多个并行操作子滤波器元件,其适于同时滤除从相应专用存储器接收的窥探请求 写入源并将这些请求的子集转发到其相关联的处理单元。

    Managing coherence via put/get windows
    72.
    发明授权
    Managing coherence via put/get windows 失效
    通过put / get窗口管理一致性

    公开(公告)号:US08122197B2

    公开(公告)日:2012-02-21

    申请号:US12543890

    申请日:2009-08-19

    IPC分类号: G06F12/00 G06F13/00 G06F13/28

    摘要: A method and apparatus for managing coherence between two processors of a two processor node of a multi-processor computer system. Generally the present invention relates to a software algorithm that simplifies and significantly speeds the management of cache coherence in a message passing parallel computer, and to hardware apparatus that assists this cache coherence algorithm. The software algorithm uses the opening and closing of put/get windows to coordinate the activated required to achieve cache coherence. The hardware apparatus may be an extension to the hardware address decode, that creates, in the physical memory address space of the node, an area of virtual memory that (a) does not actually exist, and (b) is therefore able to respond instantly to read and write requests from the processing elements.

    摘要翻译: 一种用于管理多处理器计算机系统的两个处理器节点的两个处理器之间的相干性的方法和装置。 通常,本发明涉及一种软件算法,其简化并显着加速了传送并行计算机的消息中的高速缓存一致性的管理以及辅助该高速缓存一致性算法的硬件设备。 软件算法使用put / get窗口的打开和关闭来协调激活的所需要的,以实现缓存一致性。 硬件设备可以是硬件地址解码的扩展,其在节点的物理存储器地址空间中创建(a)实际不存在的虚拟存储器的区域,并且(b)因此能够立即响应 从处理元素读取和写入请求。

    Increasing available FIFO space to prevent messaging queue deadlocks in a DMA environment
    73.
    发明授权
    Increasing available FIFO space to prevent messaging queue deadlocks in a DMA environment 有权
    增加可用的FIFO空间,以防止DMA环境中的消息队列死锁

    公开(公告)号:US08112559B2

    公开(公告)日:2012-02-07

    申请号:US12241634

    申请日:2008-09-30

    IPC分类号: G06F13/28 G06F15/167

    CPC分类号: G06F13/28

    摘要: Embodiments of the invention may be used to manage message queues in a parallel computing environment to prevent message queue deadlock. A direct memory access controller of a compute node may determine when a messaging queue is full. In response, the DMA may generate an interrupt. An interrupt handler may stop the DMA and swap all descriptors from the full messaging queue into a larger queue (or enlarge the original queue). The interrupt handler then restarts the DMA. Alternatively, the interrupt handler stops the DMA, allocates a memory block to hold queue data, and then moves descriptors from the full messaging queue into the allocated memory block. The interrupt handler then restarts the DMA. During a normal messaging advance cycle, a messaging manager attempts to inject the descriptors in the memory block into other messaging queues until the descriptors have all been processed.

    摘要翻译: 本发明的实施例可以用于在并行计算环境中管理消息队列以防止消息队列死锁。 计算节点的直接存储器访问控制器可以确定消息队列何时已满。 作为响应,DMA可能会产生中断。 中断处理程序可能会停止DMA,并将所有描述符从完整消息队列交换到更大的队列(或放大原始队列)。 然后中断处理程序重新启动DMA。 或者,中断处理程序停止DMA,分配存储块来保存队列数据,然后将描述符从完整消息队列移动到分配的内存块中。 然后中断处理程序重新启动DMA。 在正常消息传递提前周期期间,消息收发管理器尝试将描述符注入到其他消息队列中,直到描述符全部被处理。

    Collective network for computer structures
    74.
    发明授权
    Collective network for computer structures 有权
    计算机结构集体网络

    公开(公告)号:US08001280B2

    公开(公告)日:2011-08-16

    申请号:US11572372

    申请日:2005-07-18

    IPC分类号: G06F15/16

    摘要: A system and method for enabling high-speed, low-latency global collective communications among interconnected processing nodes. The global collective network optimally enables collective reduction operations to be performed during parallel algorithm operations executing in a computer structure having a plurality of the interconnected processing nodes. Router devices ate included that interconnect the nodes of the network via links to facilitate performance of low-latency global processing operations at nodes of the virtual network and class structures. The global collective network may be configured to provide global barrier and interrupt functionality in asynchronous or synchronized manner. When implemented in a massively-parallel supercomputing structure, the global collective network is physically and logically partitionable according to needs of a processing algorithm.

    摘要翻译: 一种用于实现互连处理节点之间的高速,低延迟全局集体通信的系统和方法。 全局集体网络最优地使得能够在具有多个互连处理节点的计算机结构中执行并行算法操作期间执行集体缩减操作。 路由器设备包括通过链路互连网络的节点,以便于在虚拟网络和类结构的节点处执行低延迟全局处理操作。 全局集体网络可以被配置为以异步或同步方式提供全局屏障和中断功能。 当在大规模并行超级计算结构中实现时,全局集体网络根据处理算法的需要在物理上和逻辑上可分割。

    ZONE ROUTING IN A TORUS NETWORK
    75.
    发明申请
    ZONE ROUTING IN A TORUS NETWORK 失效
    多功能网络中的区域路由

    公开(公告)号:US20110173343A1

    公开(公告)日:2011-07-14

    申请号:US12684184

    申请日:2010-01-08

    IPC分类号: G06F15/173

    CPC分类号: G06F15/17381

    摘要: A system for routing data in a network comprising a network logic device at a sending node for determining a path between the sending node and a receiving node, wherein the network logic device sets one or more selection bits and one or more hint bits within the data packet, a control register for storing one or more masks, wherein the network logic device uses the one or more selection bits to select a mask from the control register and the network logic device applies the selected mask to the hint bits to restrict routing of the data packet to one or more routing directions for the data packet within the network and selects one of the restricted routing directions from the one or more routing directions and sends the data packet along a link in the selected routing direction toward the receiving node.

    摘要翻译: 一种用于在网络中路由数据的系统,包括在发送节点处的网络逻辑设备,用于确定发送节点和接收节点之间的路径,其中网络逻辑设备设置数据内的一个或多个选择位和一个或多个提示位 分组,用于存储一个或多个掩码的控制寄存器,其中所述网络逻辑设备使用所述一个或多个选择位从所述控制寄存器中选择掩码,并且所述网络逻辑设备将所选择的掩码应用于所述提示位以限制 数据分组发送到网络内的数据分组的一个或多个路由方向,并且从一个或多个路由选择中选择一个受限制的路由方向,并沿所选路由方向的链路向接收节点发送数据分组。

    NETWORK SUPPORT FOR SYSTEM INITIATED CHECKPOINTS
    76.
    发明申请
    NETWORK SUPPORT FOR SYSTEM INITIATED CHECKPOINTS 失效
    网络支持系统启动检查

    公开(公告)号:US20110173289A1

    公开(公告)日:2011-07-14

    申请号:US12731796

    申请日:2010-03-25

    IPC分类号: G06F15/173 G06F15/167

    CPC分类号: G06F15/167 G06F11/141

    摘要: A system, method and computer program product for supporting system initiated checkpoints in parallel computing systems. The system and method generates selective control signals to perform checkpointing of system related data in presence of messaging activity associated with a user application running at the node. The checkpointing is initiated by the system such that checkpoint data of a plurality of network nodes may be obtained even in the presence of user applications running on highly parallel computers that include ongoing user messaging activity.

    摘要翻译: 一种用于在并行计算系统中支持系统启动的检查点的系统,方法和计算机程序产品。 系统和方法产生选择性控制信号,以在存在与在节点处运行的用户应用程序相关联的消息传递活动的情况下执行系统相关数据的检查点。 检查点由系统启动,使得即使在存在包括正在进行的用户消息活动的高度并行计算机上的用户应用的情况下,也可以获得多个网络节点的检查点数据。

    METHOD AND APPARATUS FOR EFFICIENTLY TRACKING QUEUE ENTRIES RELATIVE TO A TIMESTAMP
    78.
    发明申请
    METHOD AND APPARATUS FOR EFFICIENTLY TRACKING QUEUE ENTRIES RELATIVE TO A TIMESTAMP 失效
    有效跟踪与TIMESTAMP相关的队列的方法和设备

    公开(公告)号:US20090006672A1

    公开(公告)日:2009-01-01

    申请号:US11768800

    申请日:2007-06-26

    IPC分类号: G06F3/00 G06F1/04

    CPC分类号: G06F12/0835 G06F12/0831

    摘要: An apparatus and method for tracking coherence event signals transmitted in a multiprocessor system. The apparatus comprises a coherence logic unit, each unit having a plurality of queue structures with each queue structure associated with a respective sender of event signals transmitted in the system. A timing circuit associated with a queue structure controls enqueuing and dequeuing of received coherence event signals, and, a counter tracks a number of coherence event signals remaining enqueued in the queue structure and dequeued since receipt of a timestamp signal. A counter mechanism generates an output signal indicating that all of the coherence event signals present in the queue structure at the time of receipt of the timestamp signal have been dequeued. In one embodiment, the timestamp signal is asserted at the start of a memory synchronization operation and, the output signal indicates that all coherence events present when the timestamp signal was asserted have completed. This signal can then be used as part of the completion condition for the memory synchronization operation.

    摘要翻译: 一种用于跟踪在多处理器系统中发送的相干事件信号的装置和方法。 该装置包括相干逻辑单元,每个单元具有多个队列结构,每个队列结构与在系统中传输的事件信号的相应发送者相关联。 与队列结构相关联的定时电路控制接收的相干事件信号的排队和出队,并且计数器跟踪队列结构中剩余入队的多个相干事件信号,并且从接收到时间戳信号起出队。 计数器机构产生一个输出信号,指示在接收时间戳信号时存在于队列结构中的所有相干事件信号已经出队。 在一个实施例中,时间戳信号在存储器同步操作的开始被断言,并且输出信号指示当时间戳信号被断言时存在的所有相干事件已经完成。 然后可以将该信号用作存储器同步操作的完成条件的一部分。

    DMA SHARED BYTE COUNTERS IN A PARALLEL COMPUTER
    79.
    发明申请
    DMA SHARED BYTE COUNTERS IN A PARALLEL COMPUTER 失效
    DMA并发计算机中的共享字节计数器

    公开(公告)号:US20090006666A1

    公开(公告)日:2009-01-01

    申请号:US11768781

    申请日:2007-06-26

    IPC分类号: G06F13/28

    CPC分类号: G06F13/28 Y02D10/14

    摘要: A parallel computer system is constructed as a network of interconnected compute nodes. Each of the compute nodes includes at least one processor, a memory and a DMA engine. The DMA engine includes a processor interface for interfacing with the at least one processor, DMA logic, a memory interface for interfacing with the memory, a DMA network interface for interfacing with the network, injection and reception byte counters, injection and reception FIFO metadata, and status registers and control registers. The injection FIFOs maintain memory locations of the injection FIFO metadata memory locations including its current head and tail, and the reception FIFOs maintain the reception FIFO metadata memory locations including its current head and tail. The injection byte counters and reception byte counters may be shared between messages.

    摘要翻译: 并行计算机系统被构造为互连计算节点的网络。 每个计算节点包括至少一个处理器,存储器和DMA引擎。 DMA引擎包括用于与至少一个处理器连接的处理器接口,DMA逻辑,用于与存储器连接的存储器接口,用于与网络接口的DMA网络接口,注入和接收字节计数器,注入和接收FIFO元数据, 和状态寄存器和控制寄存器。 注入FIFO保持注入FIFO元数据存储器位置的存储器位置,包括其当前头部和尾部,并且接收FIFO保持包括其当前头部和尾部的接收FIFO元数据存储器位置。 注入字节计数器和接收字节计数器可以在消息之间共享。