Method for enabling direct prefetching of data during asychronous memory move operation
    71.
    发明授权
    Method for enabling direct prefetching of data during asychronous memory move operation 失效
    用于在异步存储器移动操作期间直接预取数据的方法

    公开(公告)号:US07921275B2

    公开(公告)日:2011-04-05

    申请号:US12024598

    申请日:2008-02-01

    IPC分类号: G06F12/00

    摘要: While an asynchronous memory move (AMM) operation is ongoing, a prefetch request for data from the source effective address or the destination effective address triggers cache injection by the AMM mover of relevant data from the stream of data being moved in the physical memory. The memory controller forwards the first prefetched line to the prefetch engine and L1 cache, the next cache lines in the sequence of data to the L2 cache, and a subsequent set of cache lines to the L3 cache. The memory controller then forwards the remaining data to the destination memory location. Quick access to prefetch data is enabled by buffering the stream of data in the upper caches rather than placing all the moved data within the memory. Also, the memory controller places moved data into only a subset of the available cache lines of the upper level cache.

    摘要翻译: 当异步存储器移动(AMM)操作正在进行时,来自源有效地址或目的地有效地址的数据的预取请求触发AMM移动器对来自物理存储器中移动的数据流的相关数据的高速缓存注入。 存储器控制器将第一预取行转发到预取引擎和L1高速缓存,将数据序列中的下一个高速缓存行转发到L2高速缓存,以及将后续的一组高速缓存行转发到L3高速缓存。 存储器控制器然后将剩余的数据转发到目的地存储器位置。 通过缓存高速缓存中的数据流,而不是将所有移动的数据放在内存中,可以快速访问预取数据。 此外,存储器控制器将移动的数据仅放置在高级缓存的可用高速缓存行的子集中。

    System for providing a cluster-wide system clock in a multi-tiered full-graph interconnect architecture
    72.
    发明授权
    System for providing a cluster-wide system clock in a multi-tiered full-graph interconnect architecture 有权
    用于在多层全图互连架构中提供集群范围的系统时钟的系统

    公开(公告)号:US07827428B2

    公开(公告)日:2010-11-02

    申请号:US11848440

    申请日:2007-08-31

    IPC分类号: G06F1/00 G06F1/04 G06F1/12

    摘要: A system for providing a cluster-wide system clock in a multi-tiered full graph (MTFG) interconnect architecture are provided. Heartbeat signals transmitted by each of the processor chips in the computing cluster are synchronized. Internal system clock signals are generated in each of the processor chips based on the synchronized heartbeat signals. As a result, the internal system clock signals of each of the processor chips are synchronized since the heartbeat signals, that are the basis for the internal system clock signals, are synchronized. Mechanisms are provided for performing such synchronization using direct couplings of processor chips within the same processor book, different processor books in the same supernode, and different processor books in different supernodes of the MTFG interconnect architecture.

    摘要翻译: 提供了一种用于在多层全图(MTFG)互连架构中提供集群范围的系统时钟的系统。 计算群集中的每个处理器芯片发送的心跳信号同步。 基于同步的心跳信号,在每个处理器芯片中产生内部系统时钟信号。 结果,每个处理器芯片的内部系统时钟信号被同步,因为作为内部系统时钟信号的基础的心跳信号被同步。 提供了用于使用同一处理器书中的处理器芯片的直接耦合,同一超级节点中的不同处理器书以及MTFG互连体系结构的不同超节点中的不同处理器簿来执行这种同步的机制。

    Direct/indirect transmission of information using a multi-tiered full-graph interconnect architecture
    73.
    发明授权
    Direct/indirect transmission of information using a multi-tiered full-graph interconnect architecture 失效
    使用多层全图互连架构直接/间接传输信息

    公开(公告)号:US07822889B2

    公开(公告)日:2010-10-26

    申请号:US11845209

    申请日:2007-08-27

    IPC分类号: G06F3/00

    CPC分类号: G06F13/387

    摘要: A mechanism is provided for transmitting data in a data network. A first processor of the data network receives data to be transmitted to a second processor within the data network. A determination is made if the data has previously been routed through an indirect communication link from a source processor, the indirect communication link being a communication link that does not directly couple the source processor to a final destination processor which is to receive the data. A communication link is selected over which to transmit the data from the first processor to the second processor based on results of determining if the data has previously been routed through an indirect communication link. Finally, the data is transmitted from the first processor to the second processor using the selected communication link.

    摘要翻译: 提供了用于在数据网络中传输数据的机制。 数据网络的第一处理器接收要发送到数据网络内的第二处理器的数据。 如果数据先前已经通过源处理器的间接通信链路被路由,则确定是间接通信链路是不直接将源处理器耦合到要接收数据的最终目的地处理器的通信链路。 基于确定数据是否已经通过间接通信链路路由的结果,选择通信链路来将数据从第一处理器传送到第二处理器。 最后,使用所选择的通信链路将数据从第一处理器发送到第二处理器。

    Remote Asynchronous Data Mover
    74.
    发明申请
    Remote Asynchronous Data Mover 失效
    远程异步数据移动器

    公开(公告)号:US20100268788A1

    公开(公告)日:2010-10-21

    申请号:US12425093

    申请日:2009-04-16

    摘要: A distributed data processing system executes multiple tasks within a parallel job, including a first local task on a local node and at least one task executing on a remote node, with a remote memory having real address (RA) locations mapped to one or more of the source effective addresses (EA) and destination EA of a data move operation initiated by a task executing on the local node. On initiation of the data move operation, remote asynchronous data move (RADM) logic identifies that the operation moves data to/from a first EA that is memory mapped to an RA of the remote memory. The local processor/RADM logic initiates a RADM operation that moves a copy of the data directly from/to the first remote memory by completing the RADM operation using the network interface cards (NICs) of the source and destination processing nodes, determined by accessing a data center for the node IDs of remote memory.

    摘要翻译: 分布式数据处理系统在并行作业中执行多个任务,包括本地节点上的第一本地任务和在远程节点上执行的至少一个任务,具有映射到以下的一个或多个的实地址(RA)位置的远程存储器 由本地节点上执行的任务启动的数据移动操作的源有效地址(EA)和目标EA。 在启动数据移动操作时,远程异步数据移动(RADM)逻辑识别该操作将数据移动到/从第一个EA,该第一个EA是映射到远程存储器的RA的存储器。 本地处理器/ RADM逻辑启动RADM操作,其通过使用源和目的地处理节点的网络接口卡(NIC)完成RADM操作,直接从/向第一远程存储器移动数据的副本,其通过访问 数据中心为远程存储器的节点ID。

    Data processing system, method and interconnect fabric supporting multiple planes of processing nodes
    75.
    发明授权
    Data processing system, method and interconnect fabric supporting multiple planes of processing nodes 有权
    支持多个处理节点平面的数据处理系统,方法和互连结构

    公开(公告)号:US07818388B2

    公开(公告)日:2010-10-19

    申请号:US11245887

    申请日:2005-10-07

    IPC分类号: G06F15/16

    CPC分类号: G06F15/16

    摘要: A data processing system includes a first plane including a first plurality of processing nodes, each including multiple processing units, and a second plane including a second plurality of processing nodes, each including multiple processing units. The data processing system also includes a plurality of point-to-point first tier links. Each of the first plurality and second plurality of processing nodes includes one or more first tier links among the plurality of first tier links, where the first tier link(s) within each processing node connect a pair of processing units in the same processing node for communication. The data processing system further includes a plurality of point-to-point second tier links. At least a first of the plurality of second tier links connects processing units in different ones of the first plurality of processing nodes, at least a second of the plurality of second tier links connects processing units in different ones of the second plurality of processing nodes, and at least a third of the plurality of second tier links connects a processing unit in the first plane to a processing unit in the second plane.

    摘要翻译: 数据处理系统包括包括第一多个处理节点的第一平面,每个处理节点包括多个处理单元,以及包括第二多个处理节点的第二平面,每个处理节点包括多个处理单元。 数据处理系统还包括多个点对点第一层链路。 第一多个处理节点和第二多个处理节点中的每一个包括多个第一层链路之中的一个或多个第一层链路,其中每个处理节点内的第一层链路连接相同处理节点中的一对处理单元,用于 通讯。 数据处理系统还包括多个点到点第二层链路。 所述多个第二层链路中的至少第一层连接所述第一多个处理节点中的不同处理节点中的处理单元,所述多个第二层链路中的至少一个链接连接所述第二多个处理节点中的不同处理节点中的处理单元, 并且所述多个第二层链路中的至少三分之一链路将所述第一平面中的处理单元连接到所述第二平面中的处理单元。

    Providing reliability of communication between supernodes of a multi-tiered full-graph interconnect architecture
    76.
    发明授权
    Providing reliability of communication between supernodes of a multi-tiered full-graph interconnect architecture 失效
    提供多层全图互连架构的超节点之间的通信可靠性

    公开(公告)号:US07793158B2

    公开(公告)日:2010-09-07

    申请号:US11845212

    申请日:2007-08-27

    IPC分类号: G06F11/00

    摘要: A mechanism is provided for providing reliability of communication. A first processor determines a current state of links coupled to ports of a first processor of the data processing system. Each port of the first processor comprises a plurality of links to a corresponding port on a second processor of the data processing system. The current state of the links indicates a level of error associated with each link. The first processor determines, for each link, if a level of error associated with the link exceeds a threshold. For each link whose level of error exceeds the threshold, the first processor tags the link with an error identifier in a switch associated with the ports of the first processor. The first processor reduces a level of usage for transmitting data on ports associated with links tagged with the error identifier.

    摘要翻译: 提供了一种提供通信可靠性的机制。 第一处理器确定耦合到数据处理系统的第一处理器的端口的链路的当前状态。 第一处理器的每个端口包括到数据处理系统的第二处理器上的对应端口的多个链接。 链接的当前状态指示与每个链接相关联的错误级别。 对于每个链路,第一处理器确定与链路相关联的错误级别是否超过阈值。 对于错误级别超过阈值的每个链路,第一处理器使用与第一处理器的端口相关联的交换机中的错误标识符标记该链路。 第一个处理器减少了在与标记有错误标识符的链接相关联的端口上传输数据的使用水平。

    Wake-and-Go Mechanism With Software Save of Thread State
    77.
    发明申请
    Wake-and-Go Mechanism With Software Save of Thread State 有权
    唤醒机制与线程状态的软件保存

    公开(公告)号:US20090199184A1

    公开(公告)日:2009-08-06

    申请号:US12024797

    申请日:2008-02-01

    IPC分类号: G06F9/46

    摘要: A wake-and-go mechanism is provided for a data processing system. When a thread is waiting for an event, rather than performing a series of get-and-compare sequences, the thread updates a wake-and-go array with a target address associated with the event. Software may save the state of the thread. The thread is then put to sleep. When the wake-and-go array snoops a kill at a given target address, logic associated with wake-and-go array may generate an exception, which may result in a switch to kernel mode, wherein the operating system performs some action before returning control to the originating process. In this case, the trap results in other software, such as the operating system or background sleeper thread, for example, to reload thread from thread state storage and to continue processing of the active threads on the processor.

    摘要翻译: 为数据处理系统提供了一个唤醒机制。 当一个线程正在等待一个事件,而不是执行一系列获取和比较序列,线程将更新一个唤醒数组与一个与事件关联的目标地址。 软件可以保存线程的状态。 然后线程进入睡眠状态。 当唤醒阵列在给定的目标地址上侦听到杀死时,与wake-and-go阵列相关联的逻辑可能会产生异常,这可能导致切换到内核模式,其中操作系统在返回之前执行一些操作 控制起源过程。 在这种情况下,陷阱导致其他软件,例如操作系统或背景睡眠线程,例如,从线程状态存储重新加载线程并继续处理处理器上的活动线程。

    Helper Thread for Pre-Fetching Data
    78.
    发明申请
    Helper Thread for Pre-Fetching Data 失效
    辅助线程预取数据

    公开(公告)号:US20090199170A1

    公开(公告)日:2009-08-06

    申请号:US12024191

    申请日:2008-02-01

    IPC分类号: G06F9/44

    CPC分类号: G06F8/41 G06F9/383 G06F9/3851

    摘要: A set of helper thread binaries is created to retrieve data used by a set of main thread binaries. If executing a portion of the set of helper thread binaries results in the retrieval of data needed by the set of main thread binaries, then that retrieved data is utilized by the set of main thread binaries.

    摘要翻译: 创建一组辅助线程二进制文件来检索一组主线程二进制文件使用的数据。 如果执行一组辅助线程二进制文件的一部分导致检索主线程二进制文件集所需的数据,那么该检索的数据由主线程二进制文件集合使用。

    Heterogeneous Processing Elements
    79.
    发明申请
    Heterogeneous Processing Elements 有权
    异构处理元件

    公开(公告)号:US20090198971A1

    公开(公告)日:2009-08-06

    申请号:US12024220

    申请日:2008-02-01

    IPC分类号: G06F9/30

    CPC分类号: G06F13/12

    摘要: A heterogeneous processing element model is provided where I/O devices look and act like processors. In order to be treated like a processor, an I/O processing element, or other special purpose processing element, must follow some rules and have some characteristics of a processor, such as address translation, security, interrupt handling, and exception processing, for example. The heterogeneous processing element model puts special purpose processing elements on the same playing field as processors, from a programming perspective, operating system perspective, power perspective, as the processors. The operating system can get work to a security engine, for example, in the same way it does to a processor.

    摘要翻译: 提供异构处理元件模型,其中I / O设备看起来像处理器一样操作。 为了像处理器一样处理I / O处理元件或其他专用处理元件,必须遵循一些规则并具有处理器的某些特性,例如地址转换,安全性,中断处理和异常处理,用于 例。 异构处理元素模型将特殊处理元素放在与处理器相同的竞争环境中,从编程角度,操作系统的角度,功率视角,作为处理器。 操作系统可以使用安全引擎,例如,与处理器相同。

    COMPLETION OF ASYNCHRONOUS MEMORY MOVE IN THE PRESENCE OF A BARRIER OPERATION
    80.
    发明申请
    COMPLETION OF ASYNCHRONOUS MEMORY MOVE IN THE PRESENCE OF A BARRIER OPERATION 失效
    在障碍物操作中完成异步记忆移动

    公开(公告)号:US20090198963A1

    公开(公告)日:2009-08-06

    申请号:US12024513

    申请日:2008-02-01

    IPC分类号: G06F12/02 G06F9/30

    摘要: A method within a data processing system by which a processor executes an asynchronous memory move (AMM) store (ST) instruction to complete a corresponding AMM operation in parallel with an ongoing (not yet completed), previously issued barrier operation. The processor receives the AMM ST instruction after executing the barrier operation (or SYNC instruction) and before the completion of the barrier operation or SYNC on the system fabric. The processor continues executing the AMM ST instruction, which performs a move in virtual address space and then triggers the generation of the AMM operation. The AMM operation proceeds while the barrier operation continues, independent of the processor. The processor stops further execution of all other memory access requests, excluding AMM ST instructions that are received after the barrier operation, but before completion of the barrier operation.

    摘要翻译: 数据处理系统中的方法,通过该方法,处理器执行异步存储器移动(AMM)存储(ST)指令以与正在进行的(未完成)先前发布的屏障操作并行地完成对应的AMM操作。 执行屏障操作(或SYNC指令)后,在系统结构上完成屏障操作或SYNC之前,处理器接收AMM ST指令。 处理器继续执行AMM ST指令,其在虚拟地址空间中执行移动,然后触发AMM操作的生成。 无障碍操作继续进行,与处理器无关,AMM操作继续进行。 处理器停止所有其他存储器访问请求的进一步执行,排除在屏障操作之后但在屏障操作完成之前接收的AMM ST指令。