Method and a system for using same set of registers to handle both single and double precision floating point instructions in an instruction stream
    1.
    发明授权
    Method and a system for using same set of registers to handle both single and double precision floating point instructions in an instruction stream 有权
    方法和系统,用于使用相同的寄存器组来处理指令流中的单精度和双精度浮点指令

    公开(公告)号:US07191316B2

    公开(公告)日:2007-03-13

    申请号:US10353662

    申请日:2003-01-29

    IPC分类号: G06F9/30 G06F9/40 G06F15/00

    摘要: A system for handling a plurality of single precision floating point instructions and a plurality of double precision floating point instructions that both index a same set of registers is provided. The system comprises a decode unit arranged to decode, stall, and forward at least one of the plurality of single precision and at least one of the plurality of double precision floating point instructions in a fetch group. The decode unit includes a first counter arranged to increment for each of the plurality of single precision floating point instructions forwarded down a pipeline; a second counter arranged to increment for each of the plurality of double precision floating point instructions forwarded down the pipeline; a first mask register and a second mask register. The first mask register is updated by each of the single precision floating point instructions forwarded and the second mask register is updated by each of the double precision floating point instructions forwarded.

    摘要翻译: 提供了一种用于处理多个单精度浮点指令和多个双精度浮点指令的系统,它们都对同一组寄存器进行索引。 该系统包括解码单元,其被配置为在取出组中解码,停止和转发多个单精度和至少一个双精度浮点指令中的至少一个。 所述解码单元包括第一计数器,所述第一计数器被布置为针对沿管线转发的所述多个单精度浮点指令中的每一者递增; 第二计数器,被布置为针对沿着流水线转发的多个双精度浮点指令中的每一个递增; 第一屏蔽寄存器和第二掩码寄存器。 通过转发的每个单精度浮点指令来更新第一个掩码寄存器,并且通过转发的每个双精度浮点指令更新第二个掩码寄存器。

    Method for handling condition code modifiers in an out-of-order multi-issue multi-stranded processor
    4.
    发明授权
    Method for handling condition code modifiers in an out-of-order multi-issue multi-stranded processor 有权
    用于处理无序多问题多链处理器中条件码修改器的方法

    公开(公告)号:US07065635B1

    公开(公告)日:2006-06-20

    申请号:US10738576

    申请日:2003-12-17

    IPC分类号: G06F9/38

    摘要: A technique for handling a condition code modifying instruction in an out-of-order multi-stranded processor involves providing a condition code architectural register file for each strand, providing a condition code working register file, and assigning condition code architectural register file identification information (CARF_ID) and condition code working register file identification information (CWRF_ID) to the condition code modifying instruction. CARF_ID is used to index a location in a condition code rename table to which the CWRF_ID is stored. Thereafter, upon an exception-free execution of the condition code modifying instruction, a result of the execution is copied from the condition code working register file to the condition code architectural register file dependent on CARF_ID, CWRF_ID, register type information, and strand identification information.

    摘要翻译: 一种用于处理无序多股处理器中的条件代码修改指令的技术包括为每条链提供条件代码体系结构寄存器文件,提供条件代码工作寄存器文件,以及分配条件码架构寄存器文件识别信息( CARF_ID)和条件代码工作寄存器文件识别信息(CWRF_ID)到条件代码修改指令。 CARF_ID用于索引存储CWRF_ID的条件代码重命名表中的位置。 此后,在条件代码修改指令的无异常执行时,执行结果从条件代码工作寄存器文件复制到依赖于CARF_ID,CWRF_ID,寄存器类型信息和链标识信息的条件代码架构寄存器文件 。

    Branch prediction structure with branch direction entries that share branch prediction qualifier entries
    5.
    发明授权
    Branch prediction structure with branch direction entries that share branch prediction qualifier entries 有权
    具有共享分支预测限定符条目的分支方向条目的分支预测结构

    公开(公告)号:US07380110B1

    公开(公告)日:2008-05-27

    申请号:US10660169

    申请日:2003-09-11

    IPC分类号: G06F9/40 G06F9/44

    CPC分类号: G06F9/3848

    摘要: An efficient branch prediction structure is described that bifurcates a branch prediction structure into at least two portions where information stored in the second portion is aliased amongst multiple entries of the first portion. In this way, overall storage (and layout area) can be reduced and scaling with a branch prediction structure that includes a (2N)K×1 branch direction entries and a (N/2)K×1 branch prediction qualifier entries is less dramatic than conventional techniques. An efficient branch prediction structure includes entries for branch direction indications and entries for branch prediction qualifier indications. The branch direction indication entries are more numerous than the branch prediction qualifier entries. An entry from the branch direction entries is selected based at least in part on a corresponding instruction instance identifier and an entry from the branch prediction qualifier entries is selected based at least in part on least significant bits of the instruction instance identifier.

    摘要翻译: 描述了一种有效的分支预测结构,其将分支预测结构分成至少两个部分,其中存储在第二部分中的信息在第一部分的多个条目之中进行混叠。 以这种方式,可以减少总体存储(和布局面积),并且使用包括(2N)Kx1分支方向条目和(N / 2)Kx1分支预测限定符条目的分支预测结构进行缩放比常规技术更不显着。 有效的分支预测结构包括用于分支方向指示的条目和用于分支预测限定符指示的条目。 分支方向指示条目比分支预测限定符条目更多。 至少部分地基于对应的指令实例标识符来选择来自分支方向条目的条目,并且至少部分地基于指令实例标识符的最低有效位来选择来自分支预测限定符条目的条目。

    OFFLOADING OPERATIONS FOR MAINTAINING DATA COHERENCE ACROSS A PLURALITY OF NODES
    6.
    发明申请
    OFFLOADING OPERATIONS FOR MAINTAINING DATA COHERENCE ACROSS A PLURALITY OF NODES 审中-公开
    维护多个节点间的数据协调的卸载操作

    公开(公告)号:US20080065835A1

    公开(公告)日:2008-03-13

    申请号:US11530799

    申请日:2006-09-11

    IPC分类号: G06F13/00

    摘要: Offloading data coherence operations from a primary processing unit(s) executing instantiated code responsible for data coherence in a shared-cache cluster to a data coherence offload engine reduces resource consumption and allows for efficient sharing of data in accordance with the data coherence protocol. Some of the data coherence operations, such as consulting and maintaining a directory, generating messages, and writing a data unit can be performed by a data coherence offload engine. The data coherence offload engine indicates availability of the data unit in the memory to the appropriate instantiated code. Hence, the instantiated code (the corresponding primary processing unit) is no longer burdened with some of the work load of data coherence operations. Migration of tasks from a primary processing unit(s) to data coherence offload engines allows for efficient retrieval and writing of a requested data unit.

    摘要翻译: 将执行共享高速缓存集群中的数据一致性的实例化代码的主处理单元卸载到数据一致性卸载引擎,从而减少资源消耗,并允许根据数据一致性协议有效地共享数据。 数据一致性卸载引擎可以执行一些数据一致性操作,例如查询和维护目录,生成消息以及写入数据单元。 数据相干卸载引擎指示存储器中的数据单元到适当的实例化代码的可用性。 因此,实例化代码(相应的主处理单元)不再受数据一致性操作的一些工作负载的影响。 将任务从主处理单元迁移到数据一致性卸载引擎允许有效地检索和写入请求的数据单元。

    METHOD AND SYSTEM FOR OFFLOADING COMPUTATION FLEXIBLY TO A COMMUNICATION ADAPTER
    7.
    发明申请
    METHOD AND SYSTEM FOR OFFLOADING COMPUTATION FLEXIBLY TO A COMMUNICATION ADAPTER 有权
    将通信适配器灵活运算的方法和系统

    公开(公告)号:US20130007181A1

    公开(公告)日:2013-01-03

    申请号:US13173473

    申请日:2011-06-30

    IPC分类号: G06F15/167

    CPC分类号: G06F9/5027 G06F2209/509

    摘要: A method for offloading computation flexibly to a communication adapter includes receiving a message that includes a procedure image identifier associated with a procedure image of a host application, determining a procedure image and a communication adapter processor using the procedure image identifier, and forwarding the first message to the communication adapter processor configured to execute the procedure image. The method further includes executing, on the communication adapter processor independent of a host processor, the procedure image in communication adapter memory by acquiring a host memory latch for a memory block in host memory, reading the memory block in the host memory after acquiring the host memory latch, manipulating, by executing the procedure image, the memory block in the communication adapter memory to obtain a modified memory block, committing the modified memory block to the host memory, and releasing the host memory latch.

    摘要翻译: 一种用于将计算灵活地卸载到通信适配器的方法包括接收包括与主机应用程序的过程映像相关联的过程映像标识符的消息,使用过程映像标识符确定过程映像和通信适配器处理器,以及转发第一消息 配置为执行过程映像的通信适配器处理器。 该方法还包括通过获取主机存储器中的存储器块的主机存储器锁存器来在独立于主处理器的通信适配器处理器上执行通信适配器存储器中的过程映像,在获取主机之后读取主机存储器中的存储器块 存储器锁存器,通过执行过程映像来操纵通信适配器存储器中的存储块,以获得修改的存储器块,将修改的存储器块提交到主机存储器,以及释放主机存储器锁存器。

    Scalable Interface for Connecting Multiple Computer Systems Which Performs Parallel MPI Header Matching
    8.
    发明申请
    Scalable Interface for Connecting Multiple Computer Systems Which Performs Parallel MPI Header Matching 有权
    用于连接执行并行MPI头匹配的多个计算机系统的可扩展接口

    公开(公告)号:US20120243542A1

    公开(公告)日:2012-09-27

    申请号:US13489496

    申请日:2012-06-06

    IPC分类号: H04L12/56

    CPC分类号: G06F15/17337

    摘要: An interface device for a compute node in a computer cluster which performs Message Passing Interface (MPI) header matching using parallel matching units. The interface device comprises a memory that stores posted receive queues and unexpected queues. The posted receive queues store receive requests from a process executing on the compute node. The unexpected queues store headers of send requests (e.g., from other compute nodes) that do not have a matching receive request in the posted receive queues. The interface device also comprises a plurality of hardware pipelined matcher units. The matcher units perform header matching to determine if a header in the send request matches any headers in any of the plurality of posted receive queues. Matcher units perform the header matching in parallel. In other words, the plural matching units are configured to search the memory concurrently to perform header matching.

    摘要翻译: 用于计算机集群中的计算节点的接口设备,其使用并行匹配单元执行消息传递接口(MPI)报头匹配。 接口设备包括存储发布的接收队列和意外队列的存储器。 发布的接收队列存储在计算节点上执行的进程的接收请求。 意外队列存储在发布的接收队列中不具有匹配的接收请求的发送请求(例如来自其他计算节点)的头部。 接口设备还包括多个硬件流水线匹配器单元。 匹配器单元执行报头匹配以确定发送请求中的报头是否匹配多个发布的接收队列中的任何一个中的任何报头。 匹配器单元并行执行头匹配。 换句话说,多个匹配单元被配置为同时搜索​​存储器以执行头匹配。

    Caching data in a cluster computing system which avoids false-sharing conflicts
    9.
    发明授权
    Caching data in a cluster computing system which avoids false-sharing conflicts 有权
    在集群计算系统中缓存数据,避免虚假共享冲突

    公开(公告)号:US08095617B2

    公开(公告)日:2012-01-10

    申请号:US12495635

    申请日:2009-06-30

    IPC分类号: G06F15/16

    CPC分类号: G06F12/0817 G06F12/0813

    摘要: Managing operations in a first compute node of a multi-computer system. A remote write may be received to a first address of a remote compute node. A first data structure entry may be created in a data structure, which may include the first address and status information indicating that the remote write has been received. Upon determining that the local cache of the first compute node has been updated with the remote write, the remote write may be issued to the remote compute node. Accordingly, the first data structure entry may be released upon completion of the remote write.

    摘要翻译: 在多计算机系统的第一个计算节点中管理操作。 远程写入可以被接收到远程计算节点的第一地址。 可以在数据结构中创建第一数据结构条目,数据结构可以包括指示已经接收到远程写入的第一地址和状态信息。 在确定使用远程写入更新了第一计算节点的本地高速缓存之后,可以向远程计算节点发出远程写入。 因此,可以在完成远程写入时释放第一数据结构条目。

    Software Aware Throttle Based Flow Control
    10.
    发明申请
    Software Aware Throttle Based Flow Control 有权
    软件感知基于节气门的流量控制

    公开(公告)号:US20100332676A1

    公开(公告)日:2010-12-30

    申请号:US12495452

    申请日:2009-06-30

    IPC分类号: G06F15/16

    摘要: A system, comprising a compute node and coupled network adapter (NA), that supports improved data transfer request buffering and a more efficient method of determining the completion status of data transfer requests. Transfer requests received by the NA are stored in a first buffer then transmitted on a network interface. When significant network delays are detected and the first buffer is full, the NA sets a flag to stop software issuing transfer requests. Compliant software checks this flag before sending requests and does not issue further requests. A second NA buffer stores additional received transfer requests that were perhaps in-transit. When conditions improve the flag is cleared and the first buffer used again. Completion status is efficiently determined by grouping network transfer requests. The NA counts received requests and completed network requests for each group. Software determines if a group of requests is complete by reading a count value.

    摘要翻译: 一种包括计算节点和耦合网络适配器(NA)的系统,其支持改进的数据传输请求缓冲以及确定数据传输请求的完成状态的更有效的方法。 由NA接收的传送请求存储在第一缓冲器中,然后在网络接口上发送。 当检测到显着的网络延迟并且第一个缓冲区已满时,NA设置一个标志,以停止发布传输请求的软件。 合规软件在发送请求之前检查此标志,并且不会发出进一步的请求。 第二个NA缓冲存储器可以存储可能在运输过程中的其他接收的传输请求。 当条件改善时,标志被清除,第一个缓冲区再次使用。 通过分组网络传输请求有效地确定完成状态。 NA计数接收到的请求并为每个组完成网络请求。 软件通过读取计数值来确定一组请求是否完成。