Methods for scalably exploiting parallelism in a parallel processing system
    21.
    发明授权
    Methods for scalably exploiting parallelism in a parallel processing system 有权
    在并行处理系统中可扩展地利用并行性的方法

    公开(公告)号:US08099584B2

    公开(公告)日:2012-01-17

    申请号:US13099035

    申请日:2011-05-02

    IPC分类号: G06F9/30

    摘要: Parallelism in a parallel processing subsystem is exploited in a scalable manner. A problem to be solved can be hierarchically decomposed into at least two levels of sub-problems. Individual threads of program execution are defined to solve the lowest-level sub-problems. The threads are grouped into one or more thread arrays, each of which solves a higher-level sub-problem. The thread arrays are executable by processing cores, each of which can execute at least one thread array at a time. Thread arrays can be grouped into grids of independent thread arrays, which solve still higher-level sub-problems or an entire problem. Thread arrays within a grid, or entire grids, can be distributed across all of the available processing cores as available in a particular system implementation.

    摘要翻译: 并行处理子系统中的并行性以可扩展的方式被利用。 要解决的问题可以被分层分解成至少两个级别的子问题。 定义程序执行的各个线程来解决最低级别的问题。 线程被分组成一个或多个线程数组,每个线程数组都解决了较高级的子问题。 线程数组可以通过处理内核执行,每个核心可以一次执行至少一个线程数组。 线程数组可以分组成独立线程数组的网格,从而解决更高级的子问题或整个问题。 网格中的线程数组或整个网格可以分布在所有可用处理核心中,如特定系统实现中可用的。

    Methods for scalably exploiting parallelism in a parallel processing system
    22.
    发明授权
    Methods for scalably exploiting parallelism in a parallel processing system 有权
    在并行处理系统中可扩展地利用并行性的方法

    公开(公告)号:US07937567B1

    公开(公告)日:2011-05-03

    申请号:US11555623

    申请日:2006-11-01

    IPC分类号: G06F9/30

    摘要: Parallelism in a parallel processing subsystem is exploited in a scalable manner. A problem to be solved can be hierarchically decomposed into at least two levels of sub-problems. Individual threads of program execution are defined to solve the lowest-level sub-problems. The threads are grouped into one or more thread arrays, each of which solves a higher-level sub-problem. The thread arrays are executable by processing cores, each of which can execute at least one thread array at a time. Thread arrays can be grouped into grids of independent thread arrays, which solve still higher-level sub-problems or an entire problem. Thread arrays within a grid, or entire grids, can be distributed across all of the available processing cores as available in a particular system implementation.

    摘要翻译: 并行处理子系统中的并行性以可扩展的方式被利用。 要解决的问题可以被分层分解成至少两个级别的子问题。 定义程序执行的各个线程来解决最低级别的问题。 线程被分组成一个或多个线程数组,每个线程数组都解决了较高级的子问题。 线程数组可以通过处理内核执行,每个核心可以一次执行至少一个线程数组。 线程数组可以分组成独立线程数组的网格,从而解决更高级的子问题或整个问题。 网格中的线程数组或整个网格可以分布在所有可用处理核心中,如特定系统实现中可用的。

    PARALLEL DATA PROCESSING SYSTEMS AND METHODS USING COOPERATIVE THREAD ARRAYS
    23.
    发明申请
    PARALLEL DATA PROCESSING SYSTEMS AND METHODS USING COOPERATIVE THREAD ARRAYS 有权
    并行数据处理系统和使用合作螺纹阵列的方法

    公开(公告)号:US20110087860A1

    公开(公告)日:2011-04-14

    申请号:US12972361

    申请日:2010-12-17

    IPC分类号: G06F15/16

    摘要: Parallel data processing systems and methods use cooperative thread arrays (CTAs), i.e., groups of multiple threads that concurrently execute the same program on an input data set to produce an output data set. Each thread in a CTA has a unique identifier (thread ID) that can be assigned at thread launch time. The thread ID controls various aspects of the thread's processing behavior such as the portion of the input data set to be processed by each thread, the portion of an output data set to be produced by each thread, and/or sharing of intermediate results among threads. Mechanisms for loading and launching CTAs in a representative processing core and for synchronizing threads within a CTA are also described.

    摘要翻译: 并行数据处理系统和方法使用协同线程数组(CIA),即在输入数据集上同时执行相同程序的多线程组,以产生输出数据集。 CTA中的每个线程都有一个唯一的标识符(线程ID),可以在线程启动时分配。 线程ID控制线程的处理行为的各个方面,例如由每个线程处理的输入数据集的部分,由每个线程产生的输出数据集的部分和/或线程之间的中间结果的共享 。 还描述了在代表性处理核心中加载和启动CTA并在CTA内同步线程的机制。

    Address Mapping for a Parallel Thread Processor
    24.
    发明申请
    Address Mapping for a Parallel Thread Processor 有权
    并行线程处理器的地址映射

    公开(公告)号:US20110078689A1

    公开(公告)日:2011-03-31

    申请号:US12890518

    申请日:2010-09-24

    IPC分类号: G06F9/46

    摘要: A method for thread address mapping in a parallel thread processor. The method includes receiving a thread address associated with a first thread in a thread group; computing an effective address based on a location of the thread address within a local window of a thread address space; computing a thread group address in an address space associated with the thread group based on the effective address and a thread identifier associated with a first thread; and computing a virtual address associated with the first thread based on the thread group address and a thread group identifier, where the virtual address is used to access a location in a memory associated with the thread address to load or store data.

    摘要翻译: 一种并行线程处理器中线程地址映射的方法。 该方法包括接收与线程组中的第一线程相关联的线程地址; 基于线程地址在线程地址空间的本地窗口内的位置来计算有效地址; 基于有效地址和与第一线程相关联的线程标识符计算与线程组相关联的地址空间中的线程组地址; 以及基于所述线程组地址和线程组标识符计算与所述第一线程相关联的虚拟地址,其中所述虚拟地址用于访问与所述线程地址相关联的存储器中的位置以加载或存储数据。

    Structured programming control flow in a SIMD architecture
    25.
    发明授权
    Structured programming control flow in a SIMD architecture 有权
    SIMD架构中的结构化编程控制流程

    公开(公告)号:US07877585B1

    公开(公告)日:2011-01-25

    申请号:US11845429

    申请日:2007-08-27

    摘要: One embodiment of a computing system configured to manage divergent threads in a SIMD thread group includes a stack configured to store state information for processing control instructions. A parallel processing unit is configured to perform the steps of determining if one or more threads diverge during execution of a conditional control instruction. A disable mask allows for the use of conditional return and break instructions in a multithreaded SIMD architecture. Additional control instructions are used to set up thread processing target addresses for synchronization, breaks, and returns.

    摘要翻译: 被配置为管理SIMD线程组中的发散线程的计算系统的一个实施例包括被配置为存储用于处理控制指令的状态信息的堆栈。 并行处理单元被配置为执行在执行条件控制指令期间确定一个或多个线程是否发散的步骤。 禁用掩码允许在多线程SIMD架构中使用条件返回和中断指令。 附加控制指令用于设置线程处理目标地址以进行同步,中断和返回。

    Distributing processing tasks within a processor
    26.
    发明授权
    Distributing processing tasks within a processor 有权
    在处理器中分配处理任务

    公开(公告)号:US07865894B1

    公开(公告)日:2011-01-04

    申请号:US11311997

    申请日:2005-12-19

    IPC分类号: G06F9/46

    CPC分类号: G06F9/5044

    摘要: Embodiments of the present invention facilitate distributing processing tasks within a processor. In one embodiment, processing clusters keep track of resource requirements. If sufficient resources are available within a particular processing cluster, the available processing cluster asserts a ready signal to a dispatch unit. The dispatch unit is configured to pass a processing task (such as a cooperative thread array or CTA) to an available processing cluster that asserted a ready signal. In another embodiment, a processing task is passed around a ring of processing clusters until a processing cluster with sufficient resources available accepts the processing task.

    摘要翻译: 本发明的实施例便于在处理器内分发处理任务。 在一个实施例中,处理集群跟踪资源需求。 如果在特定处理集群内有足够的资源可用,则可用的处理集群将向就绪信号发出准备好的信号。 调度单元被配置为将处理任务(诸如协作线程数组或CTA)传递到断言就绪信号的可用处理簇。 在另一个实施例中,处理任务围绕处理集群环传递,直到具有足够资源的处理集群接受处理任务为止。

    VIRTUAL ARCHITECTURE AND INSTRUCTION SET FOR PARALLEL THREAD COMPUTING
    28.
    发明申请
    VIRTUAL ARCHITECTURE AND INSTRUCTION SET FOR PARALLEL THREAD COMPUTING 有权
    虚拟架构和平行线程计算的指令集

    公开(公告)号:US20080184211A1

    公开(公告)日:2008-07-31

    申请号:US11627892

    申请日:2007-01-26

    IPC分类号: G06F9/45

    CPC分类号: G06F8/456

    摘要: A virtual architecture and instruction set support explicit parallel-thread computing. The virtual architecture defines a virtual processor that supports concurrent execution of multiple virtual threads with multiple levels of data sharing and coordination (e.g., synchronization) between different virtual threads, as well as a virtual execution driver that controls the virtual processor. A virtual instruction set architecture for the virtual processor is used to define behavior of a virtual thread and includes instructions related to parallel thread behavior, e.g., data sharing and synchronization. Using the virtual platform, programmers can develop application programs in which virtual threads execute concurrently to process data; virtual translators and drivers adapt the application code to particular hardware on which it is to execute, transparently to the programmer.

    摘要翻译: 虚拟架构和指令集支持显式并行线程计算。 虚拟架构定义了支持多个虚拟线程的并行执行的虚拟处理器,该多个虚拟线程具有不同虚拟线程之间的多级数据共享和协调(例如,同步),以及控制虚拟处理器的虚拟执行驱动器。 用于虚拟处理器的虚拟指令集架构用于定义虚拟线程的行为,并且包括与并行线程行为相关的指令,例如数据共享和同步。 使用虚拟平台,程序员可以开发虚拟线程同时执行以处理数据的应用程序; 虚拟翻译器和驱动程序将应用程序代码调整到要执行的特定硬件,对程序员是透明的。

    Simulating multiported memories using lower port count memories
    29.
    发明授权
    Simulating multiported memories using lower port count memories 有权
    使用较低端口数存储器模拟多端口存储器

    公开(公告)号:US07339592B2

    公开(公告)日:2008-03-04

    申请号:US10889730

    申请日:2004-07-13

    IPC分类号: G06F12/02 G06F13/00 G09G5/36

    摘要: An apparatus and method for simulating a multiported memory using lower port count memories as banks. A portion of memory is allocated for storing data associated with a thread. The portion of memory allocated to a thread may be stored in a single bank or in multiple banks. A collector unit coupled to each bank gathers source operands needed to process a program instruction as the source operands output from one or more banks. The collector unit outputs the source operands to an execution unit when all of the source operands needed to process the program instruction have been gathered.

    摘要翻译: 一种使用较低端口计数存储器作为存储体来模拟多端口存储器的装置和方法。 分配存储器的一部分用于存储与线程相关联的数据。 分配给线程的内存部分可以存储在单个银行或多个银行中。 耦合到每个组的收集器单元收集处理程序指令所需的源操作数,作为从一个或多个存储体输出的源操作数。 当处理程序指令所需的所有源操作数已经被收集时,收集器单元将源操作数输出到执行单元。