Multithreaded processor architecture with operational latency hiding
    1.
    发明授权
    Multithreaded processor architecture with operational latency hiding 有权
    具有可操作延迟隐藏的多线程处理器架构

    公开(公告)号:US08230423B2

    公开(公告)日:2012-07-24

    申请号:US11101601

    申请日:2005-04-07

    IPC分类号: G06F9/46 G06F9/40 G06F7/38

    摘要: A method and processor architecture for achieving a high level of concurrency and latency hiding in an “infinite-thread processor architecture” with a limited number of hardware threads is disclosed. A preferred embodiment defines “fork” and “join” instructions for spawning new context-switched threads. Context switching is used to hide the latency of both memory-access operations (i.e., loads and stores) and arithmetic/logical operations. When an operation executing in a thread incurs a latency having the potential to delay the instruction pipeline, the latency is hidden by performing a context switch to a different thread. When the result of the operation becomes available, a context switch back to that thread is performed to allow the thread to continue.

    摘要翻译: 公开了一种用于在具有有限数量的硬件线程的“无限线程处理器架构”中实现高水平并发和延迟隐藏的方法和处理器架构。 优选实施例定义了用于产生新的上下文切换线程的“fork”和“join”指令。 上下文切换用于隐藏两个存储器访问操作(即,加载和存储)和算术/逻辑操作的延迟。 当在线程中执行的操作引起具有延迟指令流水线的可能性的等待时间时,通过执行到不同线程的上下文切换来隐藏等待时间。 当操作的结果变得可用时,执行回到该线程的上下文切换以允许线程继续。

    Multithreaded processor architecture with operational latency hiding
    2.
    发明授权
    Multithreaded processor architecture with operational latency hiding 有权
    具有可操作延迟隐藏的多线程处理器架构

    公开(公告)号:US08972703B2

    公开(公告)日:2015-03-03

    申请号:US13180724

    申请日:2011-07-12

    IPC分类号: G06F9/38 G06F11/20

    摘要: A method and processor architecture for achieving a high level of concurrency and latency hiding in an “infinite-thread processor architecture” with a limited number of hardware threads is disclosed. A preferred embodiment defines “fork” and “join” instructions for spawning new context-switched threads. Context switching is used to hide the latency of both memory-access operations (i.e., loads and stores) and arithmetic/logical operations. When an operation executing in a thread incurs a latency having the potential to delay the instruction pipeline, the latency is hidden by performing a context switch to a different thread. When the result of the operation becomes available, a context switch back to that thread is performed to allow the thread to continue.

    摘要翻译: 公开了一种用于在具有有限数量的硬件线程的“无限线程处理器架构”中实现高水平并发和延迟隐藏的方法和处理器架构。 优选实施例定义了用于产生新的上下文切换线程的“fork”和“join”指令。 上下文切换用于隐藏两个存储器访问操作(即,加载和存储)和算术/逻辑操作的延迟。 当在线程中执行的操作引起具有延迟指令流水线的可能性的等待时间时,通过执行到不同线程的上下文切换来隐藏等待时间。 当操作的结果变得可用时,执行回到该线程的上下文切换以允许线程继续。

    Multithreaded processor architecture with operational latency hiding
    3.
    发明申请
    Multithreaded processor architecture with operational latency hiding 有权
    具有可操作延迟隐藏的多线程处理器架构

    公开(公告)号:US20140075159A1

    公开(公告)日:2014-03-13

    申请号:US13180724

    申请日:2011-07-12

    IPC分类号: G06F9/38

    摘要: A method and processor architecture for achieving a high level of concurrency and latency hiding in an “infinite-thread processor architecture” with a limited number of hardware threads is disclosed. A preferred embodiment defines “fork” and “join” instructions for spawning new context-switched threads. Context switching is used to hide the latency of both memory-access operations (i.e., loads and stores) and arithmetic/logical operations. When an operation executing in a thread incurs a latency having the potential to delay the instruction pipeline, the latency is hidden by performing a context switch to a different thread. When the result of the operation becomes available, a context switch back to that thread is performed to allow the thread to continue.

    摘要翻译: 公开了一种用于在具有有限数量的硬件线程的“无限线程处理器架构”中实现高水平并发和延迟隐藏的方法和处理器架构。 优选实施例定义了用于产生新的上下文切换线程的“fork”和“join”指令。 上下文切换用于隐藏两个存储器访问操作(即,加载和存储)和算术/逻辑操作的延迟。 当在线程中执行的操作引起具有延迟指令流水线的可能性的等待时间时,通过执行到不同线程的上下文切换来隐藏等待时间。 当操作的结果变得可用时,执行回到该线程的上下文切换以允许线程继续。

    Multithreaded processor architecture with implicit granularity adaptation
    4.
    发明申请
    Multithreaded processor architecture with implicit granularity adaptation 审中-公开
    具有隐式粒度适配性的多线程处理器架构

    公开(公告)号:US20060230409A1

    公开(公告)日:2006-10-12

    申请号:US11101608

    申请日:2005-04-07

    IPC分类号: G06F9/46

    CPC分类号: G06F9/4843

    摘要: A method and processor architecture for achieving a high level of concurrency and latency hiding in an “infinite-thread processor architecture” with a limited number of hardware threads is disclosed. A preferred embodiment defines “fork” and “join” instructions for spawning new threads and having a novel operational semantics. If a hardware thread is available to shepherd a forked thread, the fork and join instructions have thread creation and termination/synchronization semantics, respectively. If no hardware thread is available, however, the fork and join instructions assume subroutine call and return semantics respectively. The link register of the processor is used to determine whether a given join instruction should be treated as a thread synchronization operation or as a return from subroutine operation.

    摘要翻译: 公开了一种用于在具有有限数量的硬件线程的“无限线程处理器架构”中实现高水平并发和延迟隐藏的方法和处理器架构。 优选实施例定义了用于产生新线程并具有新颖的操作语义的“叉”和“连接”指令。 如果一个硬件线程可用于分派叉形线程,则fork和join指令分别具有线程创建和终止/同步语义。 然而,如果没有硬件线程可用,fork和join指令分别假定子程序调用和返回语义。 处理器的链接寄存器用于确定给定的连接指令是否应被视为线程同步操作或作为从子程序操作返回。

    Multithreaded processor architecture with operational latency hiding
    5.
    发明申请
    Multithreaded processor architecture with operational latency hiding 有权
    具有可操作延迟隐藏的多线程处理器架构

    公开(公告)号:US20060230408A1

    公开(公告)日:2006-10-12

    申请号:US11101601

    申请日:2005-04-07

    IPC分类号: G06F9/46

    摘要: A method and processor architecture for achieving a high level of concurrency and latency hiding in an “infinite-thread processor architecture” with a limited number of hardware threads is disclosed. A preferred embodiment defines “fork” and “join” instructions for spawning new context-switched threads. Context switching is used to hide the latency of both memory-access operations (i.e., loads and stores) and arithmetic/logical operations. When an operation executing in a thread incurs a latency having the potential to delay the instruction pipeline, the latency is hidden by performing a context switch to a different thread. When the result of the operation becomes available, a context switch back to that thread is performed to allow the thread to continue.

    摘要翻译: 公开了一种用于在具有有限数量的硬件线程的“无限线程处理器架构”中实现高水平并发和延迟隐藏的方法和处理器架构。 优选实施例定义了用于产生新的上下文切换线程的“fork”和“join”指令。 上下文切换用于隐藏两个存储器访问操作(即,加载和存储)和算术/逻辑操作的延迟。 当在线程中执行的操作引起具有延迟指令流水线的可能性的等待时间时,通过执行到不同线程的上下文切换来隐藏等待时间。 当操作的结果变得可用时,执行回到该线程的上下文切换以允许线程继续。

    Spiral cache memory and method of operating a spiral cache
    6.
    发明授权
    Spiral cache memory and method of operating a spiral cache 失效
    螺旋高速缓存和操作螺旋高速缓存的方法

    公开(公告)号:US08060699B2

    公开(公告)日:2011-11-15

    申请号:US12270095

    申请日:2008-11-13

    IPC分类号: G06F12/00 G06F13/00 G06F13/28

    摘要: A memory provides reduction in access latency for frequently-accessed values by self-organizing to always move a requested value to a front-most central storage element of a spiral. The occupant of the central location is swapped backward, which continues backward through the spiral until an empty location is swapped-to, or the last displaced value is cast out of the last location in the spiral. The elements in the spiral may be cache memories or single elements. The resulting cache memory is self-organizing and for the one-dimensional implementation has a worst-case access time proportional to N, where N is the number of tiles in the spiral. A k-dimensional spiral cache has a worst-case access time proportional to N1/k. Further, a spiral cache system provides a basis for a non-inclusive system of cache memory, which reduces the amount of space and power consumed by a cache memory of a given size.

    摘要翻译: 存储器通过自组织来提供经常访问的值的访问等待时间,以将请求的值始终移动到螺旋的最前面的中央存储元件。 中央位置的乘员被倒置,后退通过螺旋,直到空的位置被交换,或者最后的位移值被抛弃在螺旋中的最后位置。 螺旋中的元件可以是高速缓冲存储器或单个元件。 所得到的高速缓冲存储器是自组织的,并且由于一维实现具有与N成比例的最差情况访问时间,其中N是螺旋中的瓦片的数量。 k维螺旋高速缓存具有与N1 / k成比例的最差情况访问时间。 此外,螺旋高速缓存系统为非包容性高速缓存存储器系统提供了基础,其减少了给定大小的高速缓冲存储器消耗的空间和功率量。

    SPIRAL CACHE MEMORY AND METHOD OF OPERATING A SPIRAL CACHE
    7.
    发明申请
    SPIRAL CACHE MEMORY AND METHOD OF OPERATING A SPIRAL CACHE 失效
    螺旋式高速缓存存储器和操作螺旋缓存的方法

    公开(公告)号:US20100122035A1

    公开(公告)日:2010-05-13

    申请号:US12270095

    申请日:2008-11-13

    IPC分类号: G06F12/08 G06F12/00 G06F1/04

    摘要: A spiral cache memory provides reduction in access latency for frequently-accessed values by self-organizing to always move a requested value to a front-most central storage element of the spiral. The occupant of the central location is swapped backward, which continues backward through the spiral until an empty location is swapped-to, or the last displaced value is cast out of the last location in the spiral. The elements in the spiral may be cache memories or single elements. The resulting cache memory is self-organizing and for the one-dimensional implementation has a worst-case access time proportional to N, where N is the number of tiles in the spiral. A k-dimensional spiral cache has a worst-case access time proportional to N1/k. Further, a spiral cache system provides a basis for a non-inclusive system of cache memory, which reduces the amount of space and power consumed by a cache memory of a given size.

    摘要翻译: 螺旋高速缓冲存储器通过自组织来提供经常访问的值的访问延迟的降低,以便始终将请求的值移动到螺旋的最前面的中央存储元件。 中央位置的乘员被倒置,后退通过螺旋,直到空的位置被交换,或者最后的位移值被抛弃在螺旋中的最后位置。 螺旋中的元件可以是高速缓冲存储器或单个元件。 所得到的高速缓冲存储器是自组织的,并且由于一维实现具有与N成比例的最差情况访问时间,其中N是螺旋中的瓦片的数量。 k维螺旋高速缓存具有与N1 / k成比例的最差情况访问时间。 此外,螺旋高速缓存系统为非包容性高速缓冲存储器系统提供了基础,其减少了给定大小的高速缓冲存储器消耗的空间和功率的量。

    Cyclic segmented prefix circuits for mesh networks
    8.
    发明申请
    Cyclic segmented prefix circuits for mesh networks 有权
    网状网络的循环分段前缀电路

    公开(公告)号:US20070260663A1

    公开(公告)日:2007-11-08

    申请号:US11408099

    申请日:2006-04-20

    IPC分类号: G06F7/38

    CPC分类号: G06F7/506 G06F2207/5063

    摘要: Parallel prefix circuits for computing a cyclic segmented prefix operation with a mesh topology are disclosed. In one embodiment of the present invention, the elements (prefix nodes) of the mesh are arranged in row-major order. Values are accumulated toward the center of the mesh and partial results are propagated outward from the center of the mesh to complete the cyclic segmented prefix operation. This embodiment has been shown to be time-optimal. In another embodiment of the present invention, the prefix nodes are arranged such that the prefix node corresponding to the last element in the array is located at the center of the array. This alternative embodiment is not only time-optimal when accounting for wire-lengths (and therefore propagation delays), but it is also asympotically optimal in terms of minimizing the number of segmented prefix operators.

    摘要翻译: 公开了用于计算具有网格拓扑的循环分段前缀操作的并行前缀电路。 在本发明的一个实施例中,网格的元素(前缀节点)按行主顺序排列。 值向网格的中心累积,部分结果从网格的中心向外传播,以完成循环分段前缀操作。 该实施例已被证明是时间最佳的。 在本发明的另一个实施例中,前缀节点被布置成使得与阵列中的最后一个元素相对应的前缀节点位于阵列的中心。 这种替代实施例不仅在考虑线长度(因此传播延迟)时是时间最优的,而且在最小化分段前缀运算符的数量方面也是最优的。

    Tiled storage array with systolic move-to-front reorganization
    9.
    发明授权
    Tiled storage array with systolic move-to-front reorganization 有权
    平铺式存储阵列,具有收缩前移重组

    公开(公告)号:US08527726B2

    公开(公告)日:2013-09-03

    申请号:US12270132

    申请日:2008-11-13

    IPC分类号: G06F12/00 G06F13/00 G06F13/28

    摘要: A tiled storage array provides reduction in access latency for frequently-accessed values by re-organizing to always move a requested value to a front-most storage element of array. The previous occupant of the front-most location is moved backward according to a systolic pulse, and the new occupant is moved forward according to the systolic pulse, preserving the uniqueness of the stored values within the array, and providing for multiple in-flight access requests within the array. The placement heuristic that moves the values according to the systolic pulse can be implemented by control logic within identical tiles, so that the placement heuristic moves the values according to the position of the tiles within the array. The movement of the values can be performed via only next-neighbor connections of adjacent tiles within the array.

    摘要翻译: 平铺的存储阵列通过重新组织来提供频繁访问值的访问延迟,从而始终将请求的值移动到阵列的最前面的存储元素。 根据收缩期脉搏,最前面的位置的前乘客向后移动,并且新乘员根据收缩脉冲向前移动,保持阵列内存储值的唯一性,并提供多个飞行中访问 数组内的请求。 根据收缩期脉冲移动值的放置启发式可以由相同瓦片内的控制逻辑实现,使得放置启发式根据阵列内的瓦片的位置来移动值。 值的移动可以仅通过阵列内的相邻瓦片的下一个相邻连接来执行。

    Cyclic segmented prefix circuits for mesh networks
    10.
    发明授权
    Cyclic segmented prefix circuits for mesh networks 有权
    网状网络的循环分段前缀电路

    公开(公告)号:US07933940B2

    公开(公告)日:2011-04-26

    申请号:US11408099

    申请日:2006-04-20

    IPC分类号: G06F15/00

    CPC分类号: G06F7/506 G06F2207/5063

    摘要: Parallel prefix circuits for computing a cyclic segmented prefix operation with a mesh topology are disclosed. In one embodiment of the present invention, the elements (prefix nodes) of the mesh are arranged in row-major order. Values are accumulated toward the center of the mesh and partial results are propagated outward from the center of the mesh to complete the cyclic segmented prefix operation. This embodiment has been shown to be time-optimal. In another embodiment of the present invention, the prefix nodes are arranged such that the prefix node corresponding to the last element in the array is located at the center of the array. This alternative embodiment is not only time-optimal when accounting for wire-lengths (and therefore propagation delays), but it is also asympotically optimal in terms of minimizing the number of segmented prefix operators.

    摘要翻译: 公开了用于计算具有网格拓扑的循环分段前缀操作的并行前缀电路。 在本发明的一个实施例中,网格的元素(前缀节点)按行主顺序排列。 值向网格的中心累积,部分结果从网格的中心向外传播,以完成循环分段前缀操作。 该实施例已被证明是时间最佳的。 在本发明的另一个实施例中,前缀节点被布置成使得与阵列中的最后一个元素相对应的前缀节点位于阵列的中心。 这种替代实施例不仅在考虑线长度(因此传播延迟)时是时间最优的,而且在最小化分段前缀运算符的数量方面也是最优的。