SHARED SINGLE ACCESS MEMORY WITH MANAGEMENT OF MULTIPLE PARALLEL REQUESTS
    31.
    发明申请
    SHARED SINGLE ACCESS MEMORY WITH MANAGEMENT OF MULTIPLE PARALLEL REQUESTS 有权
    具有多个并行请求管理的共享单个访问记忆

    公开(公告)号:US20110252204A1

    公开(公告)日:2011-10-13

    申请号:US13165638

    申请日:2011-06-21

    IPC分类号: G06F12/08

    CPC分类号: G06F12/084 Y02D10/13

    摘要: A memory is used by concurrent threads in a multithreaded processor. Any addressable storage location is accessible by any of the concurrent threads, but only one location at a time is accessible. The memory is coupled to parallel processing engines that generate a group of parallel memory access requests, each specifying a target address that might be the same or different for different requests. Serialization logic selects one of the target addresses and determines which of the requests specify the selected target address. All such requests are allowed to proceed in parallel, while other requests are deferred. Deferred requests may be regenerated and processed through the serialization logic so that a group of requests can be satisfied by accessing each different target address in the group exactly once.

    摘要翻译: 多线程处理器中的并发线程使用内存。 任何可寻址的存储位置都可以由任何并发线程访问,但一次只能访问一个位置。 存储器耦合到并行处理引擎,其产生一组并行存储器访问请求,每个指定对于不同请求可能相同或不同的目标地址。 序列化逻辑选择一个目标地址,并确定哪个请求指定所选择的目标地址。 允许所有这些请求并行进行,而其他请求被推迟。 可以通过序列化逻辑重新生成和处理延迟请求,以便通过一次访问组中的每个不同的目标地址来满足一组请求。

    Register based queuing for texture requests
    32.
    发明授权
    Register based queuing for texture requests 有权
    基于注册排队的纹理请求

    公开(公告)号:US07864185B1

    公开(公告)日:2011-01-04

    申请号:US12256848

    申请日:2008-10-23

    CPC分类号: G06T11/60 G09G5/363

    摘要: A graphics processing unit can queue a large number of texture requests to balance out the variability of texture requests without the need for a large texture request buffer. A dedicated texture request buffer queues the relatively small texture commands and parameters. Additionally, for each queued texture command, an associated set of texture arguments, which are typically much larger than the texture command, are stored in a general purpose register. The texture unit retrieves texture commands from the texture request buffer and then fetches the associated texture arguments from the appropriate general purpose register. The texture arguments may be stored in the general purpose register designated as the destination of the final texture value computed by the texture unit. Because the destination register must be allocated for the final texture value as texture commands are queued, storing the texture arguments in this register does not consume any additional registers.

    摘要翻译: 图形处理单元可以排队大量纹理请求,以平衡纹理请求的可变性,而不需要大的纹理请求缓冲区。 专用纹理请求缓冲区排队相对较小的纹理命令和参数。 另外,对于每个排队的纹理命令,通常比纹理命令大得多的一组相关的纹理参数存储在通用寄存器中。 纹理单元从纹理请求缓冲区中检索纹理命令,然后从相应的通用寄存器获取相关的纹理参数。 纹理参数可以存储在指定为由纹理单元计算的最终纹理值的目的地的通用寄存器中。 因为当纹理命令排队时,必须为目标寄存器分配最终纹理值,所以将纹理参数存储在该寄存器中不消耗任何其他寄存器。

    Register file allocation
    33.
    发明授权
    Register file allocation 有权
    注册文件分配

    公开(公告)号:US07634621B1

    公开(公告)日:2009-12-15

    申请号:US11556677

    申请日:2006-11-03

    IPC分类号: G06F12/00

    摘要: Circuits, methods, and apparatus that provide the die area and power savings of a single-ported memory with the performance advantages of a multiported memory. One example provides register allocation methods for storing data in a multiple-bank register file. In a thin register allocation method, data for a process is stored in a single bank. In this way, different processes use different banks to avoid conflicts. In a fat register allocation method, processes store data in each bank. In this way, if one process uses a large number of registers, those registers are spread among the banks, avoiding a situation where one bank is filled and other processes are forced to share a reduced number of banks. In a hybrid register allocation method, processes store data in more than one bank, but fewer than all the banks. Each of these methods may be combined in varying ways.

    摘要翻译: 提供具有多端口存储器性能优势的单端口存储器的管芯面积和功率节省的电路,方法和装置。 一个示例提供用于将数据存储在多存储器寄存器文件中的寄存器分配方法。 在一个薄的寄存器分配方法中,一个进程的数据被存储在一个单独的存储单元中。 以这种方式,不同的流程使用不同的银行来避免冲突。 在胖寄存器分配方法中,处理将数据存储在每个存储区中。 这样一来,如果一个进程使用大量的寄存器,这些寄存器就会在银行之间传播,避免了一个银行被填满的情况,而其他进程被迫分担一个数量减少的银行。 在混合寄存器分配方法中,处理将数据存储在多个银行中,但少于所有银行。 这些方法中的每一种可以以不同的方式组合。

    Lock Mechanism to Enable Atomic Updates to Shared Memory
    34.
    发明申请
    Lock Mechanism to Enable Atomic Updates to Shared Memory 有权
    锁定机制来启用共享内存的原子更新

    公开(公告)号:US20090240860A1

    公开(公告)日:2009-09-24

    申请号:US12054267

    申请日:2008-03-24

    IPC分类号: G06F12/14

    摘要: A system and method for locking and unlocking access to a shared memory for atomic operations provides immediate feedback indicating whether or not the lock was successful. Read data is returned to the requestor with the lock status. The lock status may be changed concurrently when locking during a read or unlocking during a write. Therefore, it is not necessary to check the lock status as a separate transaction prior to or during a read-modify-write operation. Additionally, a lock or unlock may be explicitly specified for each atomic memory operation. Therefore, lock operations are not performed for operations that do not modify the contents of a memory location.

    摘要翻译: 用于锁定和解锁对原子操作的共享存储器的访问的系统和方法提供指示锁是否成功的即时反馈。 读取数据将返回给具有锁定状态的请求者。 在写入期间在读取或解锁期间锁定时,锁定状态可能会同时更改。 因此,在读取 - 修改 - 写入操作之前或期间,不必将锁定状态检查为单独的事务。 另外,可以为每个原子存储器操作明确地指定锁定或解锁。 因此,对于不修改内存位置的内容的操作,不执行锁定操作。

    VIRTUAL ARCHITECTURE AND INSTRUCTION SET FOR PARALLEL THREAD COMPUTING
    35.
    发明申请
    VIRTUAL ARCHITECTURE AND INSTRUCTION SET FOR PARALLEL THREAD COMPUTING 有权
    虚拟架构和平行线程计算的指令集

    公开(公告)号:US20080184211A1

    公开(公告)日:2008-07-31

    申请号:US11627892

    申请日:2007-01-26

    IPC分类号: G06F9/45

    CPC分类号: G06F8/456

    摘要: A virtual architecture and instruction set support explicit parallel-thread computing. The virtual architecture defines a virtual processor that supports concurrent execution of multiple virtual threads with multiple levels of data sharing and coordination (e.g., synchronization) between different virtual threads, as well as a virtual execution driver that controls the virtual processor. A virtual instruction set architecture for the virtual processor is used to define behavior of a virtual thread and includes instructions related to parallel thread behavior, e.g., data sharing and synchronization. Using the virtual platform, programmers can develop application programs in which virtual threads execute concurrently to process data; virtual translators and drivers adapt the application code to particular hardware on which it is to execute, transparently to the programmer.

    摘要翻译: 虚拟架构和指令集支持显式并行线程计算。 虚拟架构定义了支持多个虚拟线程的并行执行的虚拟处理器,该多个虚拟线程具有不同虚拟线程之间的多级数据共享和协调(例如,同步),以及控制虚拟处理器的虚拟执行驱动器。 用于虚拟处理器的虚拟指令集架构用于定义虚拟线程的行为,并且包括与并行线程行为相关的指令,例如数据共享和同步。 使用虚拟平台,程序员可以开发虚拟线程同时执行以处理数据的应用程序; 虚拟翻译器和驱动程序将应用程序代码调整到要执行的特定硬件,对程序员是透明的。

    Virtual architecture and instruction set for parallel thread computing
    36.
    发明授权
    Virtual architecture and instruction set for parallel thread computing 有权
    虚拟架构和并行线程计算的指令集

    公开(公告)号:US08321849B2

    公开(公告)日:2012-11-27

    申请号:US11627892

    申请日:2007-01-26

    IPC分类号: G06F9/45

    CPC分类号: G06F8/456

    摘要: A virtual architecture and instruction set support explicit parallel-thread computing. The virtual architecture defines a virtual processor that supports concurrent execution of multiple virtual threads with multiple levels of data sharing and coordination (e.g., synchronization) between different virtual threads, as well as a virtual execution driver that controls the virtual processor. A virtual instruction set architecture for the virtual processor is used to define behavior of a virtual thread and includes instructions related to parallel thread behavior, e.g., data sharing and synchronization. Using the virtual platform, programmers can develop application programs in which virtual threads execute concurrently to process data; virtual translators and drivers adapt the application code to particular hardware on which it is to execute, transparently to the programmer.

    摘要翻译: 虚拟架构和指令集支持显式并行线程计算。 虚拟架构定义了支持多个虚拟线程的并行执行的虚拟处理器,该多个虚拟线程具有不同虚拟线程之间的多级数据共享和协调(例如,同步),以及控制虚拟处理器的虚拟执行驱动器。 用于虚拟处理器的虚拟指令集架构用于定义虚拟线程的行为,并且包括与并行线程行为相关的指令,例如数据共享和同步。 使用虚拟平台,程序员可以开发虚拟线程同时执行以处理数据的应用程序; 虚拟翻译器和驱动程序将应用程序代码调整到要执行的特定硬件,对程序员是透明的。

    Atomic memory operators in a parallel processor
    37.
    发明授权
    Atomic memory operators in a parallel processor 有权
    并行处理器中的原子存储器操作符

    公开(公告)号:US07627723B1

    公开(公告)日:2009-12-01

    申请号:US11533896

    申请日:2006-09-21

    IPC分类号: G06F13/00 G06F13/28

    摘要: Methods, apparatuses, and systems are presented for updating data in memory while executing multiple threads of instructions, involving receiving a single instruction from one of a plurality of concurrently executing threads of instructions, in response to the single instruction received, reading data from a specific memory location, performing an operation involving the data read from the memory location to generate a result, and storing the result to the specific memory location, without requiring separate load and store instructions, and in response to the single instruction received, precluding another one of the plurality of threads of instructions from altering data at the specific memory location while reading of the data from the specific memory location, performing the operation involving the data, and storing the result to the specific memory location.

    摘要翻译: 呈现用于在执行多个指令线程的同时更新存储器中的数据的方法,装置和系统,包括从多个并发执行的指令线程中的一个接收单个指令,响应于接收的单个指令,从特定的指令读取数据 存储器位置,执行涉及从存储器位置读取的数据以产生结果的操作,以及将结果存储到特定存储器位置,而不需要单独的加载和存储指令,并且响应于接收的单个指令,排除另一个 在从特定存储器位置读取数据的同时改变在特定存储器位置处的数据的多条指令线程,执行涉及数据的操作,以及将结果存储到特定存储器位置。

    Efficient implementation of arrays of structures on SIMT and SIMD architectures
    38.
    发明授权
    Efficient implementation of arrays of structures on SIMT and SIMD architectures 有权
    在SIMT和SIMD架构上高效地实现结构数组

    公开(公告)号:US08751771B2

    公开(公告)日:2014-06-10

    申请号:US13247855

    申请日:2011-09-28

    摘要: One embodiment of the present invention sets forth a technique providing an optimized way to allocate and access memory across a plurality of thread/data lanes. Specifically, the device driver receives an instruction targeted to a memory set up as an array of structures of arrays. The device driver computes an address within the memory using information about the number of thread/data lanes and parameters from the instruction itself. The result is a memory allocation and access approach where the device driver properly computes the target address in the memory. Advantageously, processing efficiency is improved where memory in a parallel processing subsystem is internally stored and accessed as an array of structures of arrays, proportional to the SIMT/SIMD group width (the number of threads or lanes per execution group).

    摘要翻译: 本发明的一个实施例提出了一种技术,其提供了一种在多个线程/数据通道上分配和访问存储器的优化方式。 具体来说,设备驱动程序接收到作为阵列结构的阵列设置的存储器的指令。 设备驱动程序使用关于指令本身的线程/数据通道数和参数的信息来计算存储器中的地址。 结果是存储器分配和访问方法,其中设备驱动器正确地计算存储器中的目标地址。 有利的是,处理效率得到改善,其中并行处理子系统中的存储器被内部存储和访问为与SIMT / SIMD组宽度(每个执行组的线程或通道数)成比例的阵列结构的阵列。

    Thread group scheduler for computing on a parallel thread processor
    39.
    发明授权
    Thread group scheduler for computing on a parallel thread processor 有权
    线程组调度程序,用于在并行线程处理器上进行计算

    公开(公告)号:US08732713B2

    公开(公告)日:2014-05-20

    申请号:US13247819

    申请日:2011-09-28

    IPC分类号: G06F9/46

    CPC分类号: G06F9/4881 G06F2209/483

    摘要: A parallel thread processor executes thread groups belonging to multiple cooperative thread arrays (CTAs). At each cycle of the parallel thread processor, an instruction scheduler selects a thread group to be issued for execution during a subsequent cycle. The instruction scheduler selects a thread group to issue for execution by (i) identifying a pool of available thread groups, (ii) identifying a CTA that has the greatest seniority value, and (iii) selecting the thread group that has the greatest credit value from within the CTA with the greatest seniority value.

    摘要翻译: 并行线程处理器执行属于多个协作线程数组(CTA)的线程组。 在并行线程处理器的每个周期,指令调度器在随后的周期中选择要发行的线程组以执行。 指令调度器通过(i)识别可用线程组的池,(ii)识别具有最大资历值的CTA来选择要执行的线程组,以及(iii)选择具有最大信用值的线程组 从具有最高资历价值的CTA内。

    Bit reversal methods for a parallel processor
    40.
    发明授权
    Bit reversal methods for a parallel processor 有权
    并行处理器的位反转方法

    公开(公告)号:US07640284B1

    公开(公告)日:2009-12-29

    申请号:US11424514

    申请日:2006-06-15

    IPC分类号: G06F17/14

    CPC分类号: G06F17/142 G06F7/76

    摘要: Parallelism in a processor is exploited to permute a data set based on bit reversal of indices associated with data points in the data set. Permuted data can be stored in a memory having entries arranged in banks, where entries in different banks can be accessed in parallel. A destination location in the memory for a particular data point from the data set is determined based on the bit-reversed index associated with that data point. The bit-reversed index can be further modified so that at least some of the destination locations determined by different parallel processes are in different banks, allowing multiple points of the bit-reversed data set to be written in parallel.

    摘要翻译: 处理器中的并行性被利用以基于与数据集中的数据点相关联的索引的位反转来置换数据集。 被许可的数据可以存储在具有排列在存储体中的条目的存储器中,其中可以并行地访问不同存储体中的条目。 基于与该数据点相关联的位反转索引来确定来自数据集的用于特定数据点的存储器中的目的地位置。 可以进一步修改位反转索引,使得由不同并行进程确定的至少一些目的地位置在不同的存储体中,允许并行写入位反转数据集的多个点。