Parallel processor memory transfer system using parallel transfers
between processors and staging registers and sequential transfers
between staging registers and memory
    11.
    发明授权
    Parallel processor memory transfer system using parallel transfers between processors and staging registers and sequential transfers between staging registers and memory 失效
    并行处理器存储器传输系统使用处理器和分段寄存器之间的并行传输,以及分级寄存器和存储器之间的顺序传输

    公开(公告)号:US5581777A

    公开(公告)日:1996-12-03

    申请号:US400411

    申请日:1995-03-03

    摘要: A massively parallel processor is provided with a plurality of clusters. Each cluster includes a plurality of processor elements ("PEs") and a cluster memory. Each PE of the cluster has associated with it an address register, a stage register, an error register, a PE enable flag, a memory flag, and a grant request flag. A cluster data bus and an error bus connects each of the stage registers and error registers of the cluster to the memory. The grant request flags of the cluster are interconnected by a polling network, which polls only one of the grant request flags at a time. In response to a signal on the polling network and the state of the associated memory flag, the grant request flag determines an I/O operation between the associated data register and the cluster memory over the cluster data bus. Both data and error bits are associated with respective processor elements. The sequential memory operations proceed in parallel with parallel processor operations. The sequential memory operations also may be queued. Addressing modes include direct and indirect. In direct address mode, a PE addresses its own address space by appending its PE number to a broadcast partial address. The broadcast partial address is furnished over a broadcast bus, and the PE number is furnished on a cluster address bus. In indirect address mode, a PE addresses either its own address space or that of other PEs in its cluster by locally calculating a partial address, then appending to it either its own PE number or that of another PE in its cluster. The full address is furnished over the cluster address bus.

    摘要翻译: 大规模并行处理器具有多个簇。 每个群集包括多个处理器元件(“PE”)和群集存储器。 集群的每个PE与它相关联地址寄存器,阶段寄存器,错误寄存器,PE使能标志,存储器标志和授权请求标志。 集群数据总线和错误总线将集群的每个阶段寄存器和错误寄存器连接到存储器。 集群的授权请求标志由轮询网络相互连接,轮询网络一次仅轮询授权请求标志中的一个。 响应于轮询网络上的信号和相关联的存储器标志的状态,授权请求标志通过集群数据总线确定相关联的数据寄存器和集群存储器之间的I / O操作。 数据和错误位都与相应的处理器元件相关联。 顺序存储器操作与并行处理器操作并行进行。 顺序存储器操作也可以排队。 寻址模式包括直接和间接。 在直接地址模式下,PE通过将其PE号附加到广播部分地址来寻址其自己的地址空间。 广播部分地址通过广播总线提供,PE号码在集群地址总线上提供。 在间接寻址模式下,PE通过本地计算部分地址来寻址其自身的地址空间或其簇中的其他PE,然后将其自身的PE号或其簇中的另一个PE附加到该地址空间。 整个地址通过集群地址总线提供。

    Input/output system for parallel processing arrays
    12.
    发明授权
    Input/output system for parallel processing arrays 失效
    用于并行处理阵列的输入/输出系统

    公开(公告)号:US5243699A

    公开(公告)日:1993-09-07

    申请号:US802944

    申请日:1991-12-06

    IPC分类号: G06F15/173 G06F15/80

    CPC分类号: G06F15/8007 G06F15/17393

    摘要: A massively parallel processor includes an array of processor elements (20), of PEs, and a multi-stage router interconnection network (30), which is used both for I/O communications and for PE to PE communications. The I/O system (10) for the massively parallel processor is based on a globally shared addressable I/O RAM buffer memory (50) that has address and data buses (52) to the I/O devices (80, 82) and other address and data buses (42) which are coupled to a router I/O element array (40). The router I/O element array is in turn coupled to the router ports (e.g. S2.sub.-- 0.sub.-- X0) of the second stage (430) of the router interconnection network. The router I/O array provides the corner turn conversion between the massive array of router lines (32) and the relatively few buses (52) to the I/O devices.

    摘要翻译: 大规模并行处理器包括PE的处理器元件阵列(20)和用于I / O通信和用于PE至PE通信的多级路由器互连网络(30)。 用于大规模并行处理器的I / O系统(10)基于具有到I / O设备(80,82)的地址和数据总线(52)的全局共享的可寻址I / O RAM缓冲存储器(50) 耦合到路由器I / O元件阵列(40)的其它地址和数据总线(42)。 路由器I / O元件阵列又耦合到路由器互连网络的第二级(430)的路由器端口(例如,S2-0-X0)。 路由器I / O阵列提供大量路由器线路(32)和相对较少的总线(52)到I / O设备之间的拐角转换。

    Sharing data crossbar for reads and writes in a data cache
    14.
    发明授权
    Sharing data crossbar for reads and writes in a data cache 有权
    在数据高速缓存中共享用于读写数据的交叉开关

    公开(公告)号:US09286256B2

    公开(公告)日:2016-03-15

    申请号:US12892862

    申请日:2010-09-28

    CPC分类号: G06F13/4022 G06F13/4031

    摘要: The invention sets forth an L1 cache architecture that includes a crossbar unit configured to transmit data associated with both read data requests and write data requests. Data associated with read data requests is retrieved from a cache memory and transmitted to the client subsystems. Similarly, data associated with write data requests is transmitted from the client subsystems to the cache memory. To allow for the transmission of both read and write data on the crossbar unit, an arbiter is configured to schedule the crossbar unit transmissions as well and arbitrate between data requests received from the client subsystems.

    摘要翻译: 本发明提出了一种L1缓存架构,其包括被配置为发送与读取数据请求和写入数据请求相关联的数据的交叉单元。 与读取数据请求相关联的数据从高速缓冲存储器检索并发送到客户机子系统。 类似地,与写数据请求相关联的数据从客户端子系统发送到高速缓冲存储器。 为了允许在交叉开关单元上传输读取和写入数据,仲裁器被配置为调度交叉单元传输以及在从客户端子系统接收的数据请求之间进行仲裁。

    SHARED SINGLE-ACCESS MEMORY WITH MANAGEMENT OF MULTIPLE PARALLEL REQUESTS
    16.
    发明申请
    SHARED SINGLE-ACCESS MEMORY WITH MANAGEMENT OF MULTIPLE PARALLEL REQUESTS 有权
    具有多个并行请求管理的共享单访存储器

    公开(公告)号:US20120221808A1

    公开(公告)日:2012-08-30

    申请号:US13466057

    申请日:2012-05-07

    IPC分类号: G06F12/00

    CPC分类号: G06F12/084 Y02D10/13

    摘要: A memory is used by concurrent threads in a multithreaded processor. Any addressable storage location is accessible by any of the concurrent threads, but only one location at a time is accessible. The memory is coupled to parallel processing engines that generate a group of parallel memory access requests, each specifying a target address that might be the same or different for different requests. Serialization logic selects one of the target addresses and determines which of the requests specify the selected target address. All such requests are allowed to proceed in parallel, while other requests are deferred. Deferred requests may be regenerated and processed through the serialization logic so that a group of requests can be satisfied by accessing each different target address in the group exactly once.

    摘要翻译: 多线程处理器中的并发线程使用内存。 任何可寻址的存储位置都可以由任何并发线程访问,但一次只能访问一个位置。 存储器耦合到并行处理引擎,其产生一组并行存储器访问请求,每个指定对于不同请求可能相同或不同的目标地址。 序列化逻辑选择一个目标地址,并确定哪个请求指定所选择的目标地址。 允许所有这些请求并行进行,而其他请求被推迟。 可以通过序列化逻辑重新生成和处理延迟请求,以便通过一次访问组中的每个不同的目标地址来满足一组请求。

    Shared single-access memory with management of multiple parallel requests
    17.
    发明授权
    Shared single-access memory with management of multiple parallel requests 有权
    具有管理多个并行请求的共享单访问存储器

    公开(公告)号:US08176265B2

    公开(公告)日:2012-05-08

    申请号:US13165638

    申请日:2011-06-21

    IPC分类号: G06F12/00

    CPC分类号: G06F12/084 Y02D10/13

    摘要: A memory is used by concurrent threads in a multithreaded processor. Any addressable storage location is accessible by any of the concurrent threads, but only one location at a time is accessible. The memory is coupled to parallel processing engines that generate a group of parallel memory access requests, each specifying a target address that might be the same or different for different requests. Serialization logic selects one of the target addresses and determines which of the requests specify the selected target address. All such requests are allowed to proceed in parallel, while other requests are deferred. Deferred requests may be regenerated and processed through the serialization logic so that a group of requests can be satisfied by accessing each different target address in the group exactly once.

    摘要翻译: 多线程处理器中的并发线程使用内存。 任何可寻址的存储位置都可以由任何并发线程访问,但一次只能访问一个位置。 存储器耦合到并行处理引擎,其产生一组并行存储器访问请求,每个指定对于不同请求可能相同或不同的目标地址。 序列化逻辑选择一个目标地址,并确定哪个请求指定所选择的目标地址。 允许所有这些请求并行进行,而其他请求被推迟。 可以通过序列化逻辑重新生成和处理延迟请求,以便通过一次访问组中的每个不同的目标地址来满足一组请求。

    Shared memory with parallel access and access conflict resolution mechanism
    18.
    发明授权
    Shared memory with parallel access and access conflict resolution mechanism 有权
    共享内存具有并行访问和访问冲突解决机制

    公开(公告)号:US08108625B1

    公开(公告)日:2012-01-31

    申请号:US11554546

    申请日:2006-10-30

    IPC分类号: G06F12/00

    CPC分类号: G06F13/1663

    摘要: Concurrent threads in a multithreaded processor share access to a memory, with any location in the shared memory being accessible by any thread. In one embodiment, the shared memory has multiple independently-addressable memory banks, and one location per bank can be accessed in parallel. Parallel processing engines executing the threads generate a group of parallel memory access requests. Address conflict logic determines whether the requests can be satisfied in parallel (e.g., based on bank access constraints) and serializes the requests to the extent needed to avoid conflicts. In some embodiments, data read from one address in the shared memory can be broadcast to multiple processing engines.

    摘要翻译: 多线程处理器中的并发线程共享对内存的访问,任何线程都可以访问共享内存中的任何位置。 在一个实施例中,共享存储器具有多个可独立寻址的存储体,并且可以并行地访问每个存储体的一个位置。 执行线程的并行处理引擎生成一组并行内存访问请求。 地址冲突逻辑确定请求是否可以并行满足(例如,基于银行访问约束),并将请求序列化到避免冲突所需的程度。 在一些实施例中,从共享存储器中的一个地址读取的数据可以广播到多个处理引擎。

    Methods for scalably exploiting parallelism in a parallel processing system
    19.
    发明授权
    Methods for scalably exploiting parallelism in a parallel processing system 有权
    在并行处理系统中可扩展地利用并行性的方法

    公开(公告)号:US08099584B2

    公开(公告)日:2012-01-17

    申请号:US13099035

    申请日:2011-05-02

    IPC分类号: G06F9/30

    摘要: Parallelism in a parallel processing subsystem is exploited in a scalable manner. A problem to be solved can be hierarchically decomposed into at least two levels of sub-problems. Individual threads of program execution are defined to solve the lowest-level sub-problems. The threads are grouped into one or more thread arrays, each of which solves a higher-level sub-problem. The thread arrays are executable by processing cores, each of which can execute at least one thread array at a time. Thread arrays can be grouped into grids of independent thread arrays, which solve still higher-level sub-problems or an entire problem. Thread arrays within a grid, or entire grids, can be distributed across all of the available processing cores as available in a particular system implementation.

    摘要翻译: 并行处理子系统中的并行性以可扩展的方式被利用。 要解决的问题可以被分层分解成至少两个级别的子问题。 定义程序执行的各个线程来解决最低级别的问题。 线程被分组成一个或多个线程数组,每个线程数组都解决了较高级的子问题。 线程数组可以通过处理内核执行,每个核心可以一次执行至少一个线程数组。 线程数组可以分组成独立线程数组的网格,从而解决更高级的子问题或整个问题。 网格中的线程数组或整个网格可以分布在所有可用处理核心中,如特定系统实现中可用的。

    Methods for scalably exploiting parallelism in a parallel processing system
    20.
    发明授权
    Methods for scalably exploiting parallelism in a parallel processing system 有权
    在并行处理系统中可扩展地利用并行性的方法

    公开(公告)号:US07937567B1

    公开(公告)日:2011-05-03

    申请号:US11555623

    申请日:2006-11-01

    IPC分类号: G06F9/30

    摘要: Parallelism in a parallel processing subsystem is exploited in a scalable manner. A problem to be solved can be hierarchically decomposed into at least two levels of sub-problems. Individual threads of program execution are defined to solve the lowest-level sub-problems. The threads are grouped into one or more thread arrays, each of which solves a higher-level sub-problem. The thread arrays are executable by processing cores, each of which can execute at least one thread array at a time. Thread arrays can be grouped into grids of independent thread arrays, which solve still higher-level sub-problems or an entire problem. Thread arrays within a grid, or entire grids, can be distributed across all of the available processing cores as available in a particular system implementation.

    摘要翻译: 并行处理子系统中的并行性以可扩展的方式被利用。 要解决的问题可以被分层分解成至少两个级别的子问题。 定义程序执行的各个线程来解决最低级别的问题。 线程被分组成一个或多个线程数组,每个线程数组都解决了较高级的子问题。 线程数组可以通过处理内核执行,每个核心可以一次执行至少一个线程数组。 线程数组可以分组成独立线程数组的网格,从而解决更高级的子问题或整个问题。 网格中的线程数组或整个网格可以分布在所有可用处理核心中,如特定系统实现中可用的。