EFFICIENT IMPLEMENTATION OF ARRAYS OF STRUCTURES ON SIMT AND SIMD ARCHITECTURES
    51.
    发明申请
    EFFICIENT IMPLEMENTATION OF ARRAYS OF STRUCTURES ON SIMT AND SIMD ARCHITECTURES 有权
    对SIMT和SIMD建筑结构的有效实施

    公开(公告)号:US20120089792A1

    公开(公告)日:2012-04-12

    申请号:US13247855

    申请日:2011-09-28

    IPC分类号: G06F12/00

    摘要: One embodiment of the present invention sets forth a technique providing an optimized way to allocate and access memory across a plurality of thread/data lanes. Specifically, the device driver receives an instruction targeted to a memory set up as an array of structures of arrays. The device driver computes an address within the memory using information about the number of thread/data lanes and parameters from the instruction itself. The result is a memory allocation and access approach where the device driver properly computes the target address in the memory. Advantageously, processing efficiency is improved where memory in a parallel processing subsystem is internally stored and accessed as an array of structures of arrays, proportional to the SIMT/SIMD group width (the number of threads or lanes per execution group).

    摘要翻译: 本发明的一个实施例提出了一种技术,其提供了一种在多个线程/数据通道上分配和访问存储器的优化方式。 具体来说,设备驱动程序接收到作为阵列结构的阵列设置的存储器的指令。 设备驱动程序使用关于指令本身的线程/数据通道数和参数的信息来计算存储器中的地址。 结果是存储器分配和访问方法,其中设备驱动器正确地计算存储器中的目标地址。 有利的是,处理效率得到改善,其中并行处理子系统中的存储器被内部存储和访问为与SIMT / SIMD组宽度(每个执行组的线程或通道数)成比例的阵列结构的阵列。

    METHOD AND SYTEM FOR PREDICATE-CONTROLLED MULTI-FUNCTION INSTRUCTIONS
    52.
    发明申请
    METHOD AND SYTEM FOR PREDICATE-CONTROLLED MULTI-FUNCTION INSTRUCTIONS 审中-公开
    用于预测控制的多功能指令的方法和系统

    公开(公告)号:US20120084539A1

    公开(公告)日:2012-04-05

    申请号:US13247833

    申请日:2011-09-28

    IPC分类号: G06F9/30

    摘要: Techniques are disclosed for executing conditional computer instructions in an efficient manner that reduces bubbles and idle states. In one embodiment, dual-function instruction execution is disclosed where the dual-function instruction has two possible functions (or operations), the choice of which is controlled by a predicate value with a true or false value. Among other things, the disclosed techniques provide dynamic control for choosing which operation to execute leading to more efficiently executed code.

    摘要翻译: 公开了以有效的方式执行条件计算机指令的技术,其减少气泡和空闲状态。 在一个实施例中,公开了双功能指令执行,其中双功能指令具有两个可能的功能(或操作),其选择由具有真值或假值的谓词值控制。 除此之外,所公开的技术提供动态控制,用于选择要执行的操作导致更有效执行的代码。

    LOCK MECHANISM TO ENABLE ATOMIC UPDATES TO SHARED MEMORY
    53.
    发明申请
    LOCK MECHANISM TO ENABLE ATOMIC UPDATES TO SHARED MEMORY 有权
    锁定机制使原始内存更新到共享内存

    公开(公告)号:US20120036329A1

    公开(公告)日:2012-02-09

    申请号:US13276224

    申请日:2011-10-18

    IPC分类号: G06F12/14

    摘要: A system and method for locking and unlocking access to a shared memory for atomic operations provides immediate feedback indicating whether or not the lock was successful. Read data is returned to the requestor with the lock status. The lock status may be changed concurrently when locking during a read or unlocking during a write. Therefore, it is not necessary to check the lock status as a separate transaction prior to or during a read-modify-write operation. Additionally, a lock or unlock may be explicitly specified for each atomic memory operation. Therefore, lock operations are not performed for operations that do not modify the contents of a memory location.

    摘要翻译: 用于锁定和解锁对原子操作的共享存储器的访问的系统和方法提供指示锁是否成功的即时反馈。 读取数据将返回给具有锁定状态的请求者。 在写入期间在读取或解锁期间锁定时,锁定状态可能会同时更改。 因此,在读取 - 修改 - 写入操作之前或期间,不必将锁定状态检查为单独的事务。 另外,可以为每个原子存储器操作明确地指定锁定或解锁。 因此,对于不修改内存位置的内容的操作,不执行锁定操作。

    Parallel data processing systems and methods using cooperative thread arrays with unique thread identifiers as an input to compute an identifier of a location in a shared memory
    54.
    发明授权
    Parallel data processing systems and methods using cooperative thread arrays with unique thread identifiers as an input to compute an identifier of a location in a shared memory 有权
    使用具有唯一线程标识符的协作线程数组作为输入的并行数据处理系统和方法来计算共享存储器中位置的标识符

    公开(公告)号:US08112614B2

    公开(公告)日:2012-02-07

    申请号:US12972361

    申请日:2010-12-17

    IPC分类号: G06F15/16

    摘要: Parallel data processing systems and methods use cooperative thread arrays (CTAs), i.e., groups of multiple threads that concurrently execute the same program on an input data set to produce an output data set. Each thread in a CTA has a unique identifier (thread ID) that can be assigned at thread launch time. The thread ID controls various aspects of the thread's processing behavior such as the portion of the input data set to be processed by each thread, the portion of an output data set to be produced by each thread, and/or sharing of intermediate results among threads. Mechanisms for loading and launching CTAs in a representative processing core and for synchronizing threads within a CTA are also described.

    摘要翻译: 并行数据处理系统和方法使用协同线程数组(CIA),即在输入数据集上同时执行相同程序的多线程组,以产生输出数据集。 CTA中的每个线程都有一个唯一的标识符(线程ID),可以在线程启动时分配。 线程ID控制线程的处理行为的各个方面,例如由每个线程处理的输入数据集的部分,由每个线程生成的输出数据集的部分和/或线程之间的中间结果的共享 。 还描述了在代表性处理核心中加载和启动CTA并在CTA内同步线程的机制。

    COOPERATIVE THREAD ARRAY REDUCTION AND SCAN OPERATIONS
    55.
    发明申请
    COOPERATIVE THREAD ARRAY REDUCTION AND SCAN OPERATIONS 有权
    合作螺线减排和扫描作业

    公开(公告)号:US20110078417A1

    公开(公告)日:2011-03-31

    申请号:US12890227

    申请日:2010-09-24

    IPC分类号: G06F9/38

    摘要: One embodiment of the present invention sets forth a technique for performing aggregation operations across multiple threads that execute independently. Aggregation is specified as part of a barrier synchronization or barrier arrival instruction, where in addition to performing the barrier synchronization or arrival, the instruction aggregates (using reduction or scan operations) values supplied by each thread. When a thread executes the barrier aggregation instruction the thread contributes to a scan or reduction result, and waits to execute any more instructions until after all of the threads have executed the barrier aggregation instruction. A reduction result is communicated to each thread after all of the threads have executed the barrier aggregation instruction and a scan result is communicated to each thread as the barrier aggregation instruction is executed by the thread.

    摘要翻译: 本发明的一个实施例提出了一种用于跨独立执行的多个线程执行聚合操作的技术。 聚合被指定为屏障同步或屏障到达指令的一部分,其中除了执行屏障同步或到达之外,指令聚合(使用缩减或扫描操作)由每个线程提供的值。 当线程执行屏障聚合指令时,线程有助于扫描或缩小结果,并等待执行任何更多指令,直到所有线程都执行了阻挡聚合指令为止。 在所有线程执行了屏障聚合指令之后,向每个线程传送减少结果,并且当线程执行屏障聚合指令时,将扫描结果传送给每个线程。

    Efficient Predicated Execution For Parallel Processors
    56.
    发明申请
    Efficient Predicated Execution For Parallel Processors 审中-公开
    并行处理器的高效预测执行

    公开(公告)号:US20110078415A1

    公开(公告)日:2011-03-31

    申请号:US12891629

    申请日:2010-09-27

    IPC分类号: G06F9/30 G06F9/46

    摘要: The invention set forth herein describes a mechanism for predicated execution of instructions within a parallel processor executing multiple threads or data lanes. Each thread or data lane executing within the parallel processor is associated with a predicate register that stores a set of 1-bit predicates. Each of these predicates can be set using different types of predicate-setting instructions, where each predicate setting instruction specifies one or more source operands, at least one operation to be performed on the source operands, and one or more destination predicates for storing the result of the operation. An instruction can be guarded by a predicate that may influence whether the instruction is executed for a particular thread or data lane or how the instruction is executed for a particular thread or data lane.

    摘要翻译: 本文阐述的发明描述了用于在执行多个线程或数据通道的并行处理器内预测执行指令的机制。 在并行处理器内执行的每个线程或数据通道与存储一组1位谓词的谓词寄存器相关联。 可以使用不同类型的谓词设置指令来设置这些谓词中的每一个,其中每个谓词设置指令指定一个或多个源操作数,要对源操作数执行的至少一个操作以及用于存储结果的一个或多个目标谓词 的操作。 指令可以由可能影响特定线程或数据通道执行指令的指令或如何针对特定线程或数据通道执行指令的谓词来保护。

    Processing an indirect branch instruction in a SIMD architecture
    58.
    发明授权
    Processing an indirect branch instruction in a SIMD architecture 有权
    在SIMD架构中处理间接分支指令

    公开(公告)号:US07761697B1

    公开(公告)日:2010-07-20

    申请号:US11557082

    申请日:2006-11-06

    IPC分类号: G06F7/38 G06F9/00 G06F9/44

    摘要: One embodiment of a computing system configured to manage divergent threads in a thread group includes a stack configured to store at least one token and a multithreaded processing unit. The multithreaded processing unit is configured to perform the steps of fetching a program instruction, determining that the program instruction is an indirect branch instruction, and processing the indirect branch instruction as a sequence of two-way branches to execute an indirect branch instruction with multiple branch addresses. Indirect branch instructions may be used to allow greater flexibility since the branch address or multiple branch addresses do not need to be determined at compile time.

    摘要翻译: 被配置为管理线程组中的发散线程的计算系统的一个实施例包括配置成存储至少一个令牌和多线程处理单元的堆栈。 多线程处理单元被配置为执行以下步骤:获取程序指令,确定程序指令是间接分支指令,以及将间接分支指令处理为双向分支序列,以执行具有多个分支的间接分支指令 地址 可以使用间接分支指令来允许更大的灵活性,因为在编译时不需要确定分支地址或多个分支地址。

    Single interconnect providing read and write access to a memory shared by concurrent threads
    59.
    发明授权
    Single interconnect providing read and write access to a memory shared by concurrent threads 有权
    单一互连提供对并发线程共享的内存的读写访问

    公开(公告)号:US07680988B1

    公开(公告)日:2010-03-16

    申请号:US11554563

    申请日:2006-10-30

    IPC分类号: G06F13/16

    摘要: A shared memory is usable by concurrent threads in a multithreaded processor, with any addressable storage location in the shared memory being readable and writeable by any of the threads. Processing engines that execute the threads are coupled to the shared memory via an interconnect that transfers data in only one direction (e.g., from the shared memory to the processing engines); the same interconnect supports both read and write operations. The interconnect advantageously supports multiple parallel read or write operations.

    摘要翻译: 共享存储器可由多线程处理器中的并发线程使用,共享存储器中的任何可寻址存储位置可由任何线程读取和写入。 执行线程的处理引擎通过仅在一个方向(例如,从共享存储器到处理引擎)传送数据的互连来耦合到共享存储器; 相同的互连支持读写操作。 互连有利地支持多个并行读或写操作。

    Atomic memory operators in a parallel processor
    60.
    发明授权
    Atomic memory operators in a parallel processor 有权
    并行处理器中的原子存储器操作符

    公开(公告)号:US07627723B1

    公开(公告)日:2009-12-01

    申请号:US11533896

    申请日:2006-09-21

    IPC分类号: G06F13/00 G06F13/28

    摘要: Methods, apparatuses, and systems are presented for updating data in memory while executing multiple threads of instructions, involving receiving a single instruction from one of a plurality of concurrently executing threads of instructions, in response to the single instruction received, reading data from a specific memory location, performing an operation involving the data read from the memory location to generate a result, and storing the result to the specific memory location, without requiring separate load and store instructions, and in response to the single instruction received, precluding another one of the plurality of threads of instructions from altering data at the specific memory location while reading of the data from the specific memory location, performing the operation involving the data, and storing the result to the specific memory location.

    摘要翻译: 呈现用于在执行多个指令线程的同时更新存储器中的数据的方法,装置和系统,包括从多个并发执行的指令线程中的一个接收单个指令,响应于接收的单个指令,从特定的指令读取数据 存储器位置,执行涉及从存储器位置读取的数据以产生结果的操作,以及将结果存储到特定存储器位置,而不需要单独的加载和存储指令,并且响应于接收的单个指令,排除另一个 在从特定存储器位置读取数据的同时改变在特定存储器位置处的数据的多条指令线程,执行涉及数据的操作,以及将结果存储到特定存储器位置。