Generating event signals for performance register control using non-operative instructions
    1.
    发明授权
    Generating event signals for performance register control using non-operative instructions 有权
    使用非操作指令生成用于性能寄存器控制的事件信号

    公开(公告)号:US07809928B1

    公开(公告)日:2010-10-05

    申请号:US11313872

    申请日:2005-12-20

    IPC分类号: G06F9/30 G06F17/00 G09G5/02

    摘要: One embodiment of an instruction decoder includes an instruction parser configured to process a first non-operative instruction and to generate a first event signal corresponding to the first non-operative instruction, and a first event multiplexer configured to receive the first event signal from the instruction parser, to select the first event signal from one or more event signals and to transmit the first event signal to an event logic block. The instruction decoder may be implemented in a multithreaded processing unit, such as a shader unit, and the occurrences of the first event signal may be tracked when one or more threads are executed within the processing unit. The resulting event signal count may provide a designer with a better understanding of the behavior of a program, such as a shader program, executed within the processing unit, thereby facilitating overall processing unit and program design.

    摘要翻译: 指令解码器的一个实施例包括:指令解析器,被配置为处理第一非操作指令并产生对应于第一非操作指令的第一事件信号;以及第一事件多路复用器,被配置为从指令接收第一事件信号 解析器,以从一个或多个事件信号中选择第一事件信号,并将第一事件信号发送到事件逻辑块。 指令解码器可以在诸如着色器单元的多线程处理单元中实现,并且当在处理单元内执行一个或多个线程时,可以跟踪第一事件信号的出现。 所得到的事件信号计数可以使设计者更好地理解在处理单元内执行的诸如着色器程序之类的程序的行为,从而有助于整体处理单元和程序设计。

    Unified addressing and instructions for accessing parallel memory spaces
    2.
    发明授权
    Unified addressing and instructions for accessing parallel memory spaces 有权
    统一寻址和访问并行存储空间的指令

    公开(公告)号:US08271763B2

    公开(公告)日:2012-09-18

    申请号:US12567637

    申请日:2009-09-25

    IPC分类号: G06F12/10

    摘要: One embodiment of the present invention sets forth a technique for unifying the addressing of multiple distinct parallel memory spaces into a single address space for a thread. A unified memory space address is converted into an address that accesses one of the parallel memory spaces for that thread. A single type of load or store instruction may be used that specifies the unified memory space address for a thread instead of using a different type of load or store instruction to access each of the distinct parallel memory spaces.

    摘要翻译: 本发明的一个实施例提出了一种用于将多个不同的并行存储器空间的寻址统一为用于线程的单个地址空间的技术。 统一的存储空间地址被转换为访问该线程的并行存储器空间之一的地址。 可以使用单一类型的加载或存储指令,其指定线程的统一存储器空间地址,而不是使用不同类型的加载或存储指令来访问每个不同的并行存储器空间。

    Unified Addressing and Instructions for Accessing Parallel Memory Spaces
    3.
    发明申请
    Unified Addressing and Instructions for Accessing Parallel Memory Spaces 有权
    统一寻址和访问并行内存空间的说明

    公开(公告)号:US20110078406A1

    公开(公告)日:2011-03-31

    申请号:US12567637

    申请日:2009-09-25

    IPC分类号: G06F12/10

    摘要: One embodiment of the present invention sets forth a technique for unifying the addressing of multiple distinct parallel memory spaces into a single address space for a thread. A unified memory space address is converted into an address that accesses one of the parallel memory spaces for that thread. A single type of load or store instruction may be used that specifies the unified memory space address for a thread instead of using a different type of load or store instruction to access each of the distinct parallel memory spaces.

    摘要翻译: 本发明的一个实施例提出了一种用于将多个不同的并行存储器空间的寻址统一为用于线程的单个地址空间的技术。 统一的存储空间地址被转换为访问该线程的并行存储器空间之一的地址。 可以使用单一类型的加载或存储指令,其指定线程的统一存储器空间地址,而不是使用不同类型的加载或存储指令来访问每个不同的并行存储器空间。

    Coalescing memory barrier operations across multiple parallel threads
    6.
    发明授权
    Coalescing memory barrier operations across multiple parallel threads 有权
    在多个并行线程之间合并记忆障碍操作

    公开(公告)号:US09223578B2

    公开(公告)日:2015-12-29

    申请号:US12887081

    申请日:2010-09-21

    IPC分类号: G06F9/46 G06F9/38 G06F9/30

    摘要: One embodiment of the present invention sets forth a technique for coalescing memory barrier operations across multiple parallel threads. Memory barrier requests from a given parallel thread processing unit are coalesced to reduce the impact to the rest of the system. Additionally, memory barrier requests may specify a level of a set of threads with respect to which the memory transactions are committed. For example, a first type of memory barrier instruction may commit the memory transactions to a level of a set of cooperating threads that share an L1 (level one) cache. A second type of memory barrier instruction may commit the memory transactions to a level of a set of threads sharing a global memory. Finally, a third type of memory barrier instruction may commit the memory transactions to a system level of all threads sharing all system memories. The latency required to execute the memory barrier instruction varies based on the type of memory barrier instruction.

    摘要翻译: 本发明的一个实施例提出了一种用于在多个并行线程之间聚合存储器屏障操作的技术。 来自给定并行线程处理单元的存储器屏障请求被合并以减少对系统其余部分的影响。 此外,存储器屏障请求可以指定针对其提交内存事务的一组线程的级别。 例如,第一类型的存储器障碍指令可以将存储器事务提交到共享L1(一级)高速缓存的一组协作线程的级别。 第二种类型的存储器障碍指令可以将存储器事务提交到共享全局存储器的一组线程的级别。 最后,第三种类型的存储器障碍指令可以将存储器事务提交到共享所有系统存储器的所有线程的系统级。 执行存储器屏障指令所需的延迟基于存储器屏障指令的类型而变化。

    Register based queuing for texture requests
    8.
    发明授权
    Register based queuing for texture requests 有权
    基于注册排队的纹理请求

    公开(公告)号:US07456835B2

    公开(公告)日:2008-11-25

    申请号:US11339937

    申请日:2006-01-25

    CPC分类号: G06T11/60 G09G5/363

    摘要: A graphics processing unit can queue a large number of texture requests to balance out the variability of texture requests without the need for a large texture request buffer. A dedicated texture request buffer queues the relatively small texture commands and parameters. Additionally, for each queued texture command, an associated set of texture arguments, which are typically much larger than the texture command, are stored in a general purpose register. The texture unit retrieves texture commands from the texture request buffer and then fetches the associated texture arguments from the appropriate general purpose register. The texture arguments may be stored in the general purpose register designated as the destination of the final texture value computed by the texture unit. Because the destination register must be allocated for the final texture value as texture commands are queued, storing the texture arguments in this register does not consume any additional registers.

    摘要翻译: 图形处理单元可以排队大量纹理请求,以平衡纹理请求的可变性,而不需要大的纹理请求缓冲区。 专用纹理请求缓冲区排队相对较小的纹理命令和参数。 另外,对于每个排队的纹理命令,通常比纹理命令大得多的一组相关的纹理参数存储在通用寄存器中。 纹理单元从纹理请求缓冲区中检索纹理命令,然后从相应的通用寄存器获取相关的纹理参数。 纹理参数可以存储在指定为由纹理单元计算的最终纹理值的目的地的通用寄存器中。 因为当纹理命令排队时,必须为目标寄存器分配最终纹理值,所以将纹理参数存储在该寄存器中不消耗任何其他寄存器。

    SHARED SINGLE-ACCESS MEMORY WITH MANAGEMENT OF MULTIPLE PARALLEL REQUESTS
    10.
    发明申请
    SHARED SINGLE-ACCESS MEMORY WITH MANAGEMENT OF MULTIPLE PARALLEL REQUESTS 有权
    具有多个并行请求管理的共享单访存储器

    公开(公告)号:US20120221808A1

    公开(公告)日:2012-08-30

    申请号:US13466057

    申请日:2012-05-07

    IPC分类号: G06F12/00

    CPC分类号: G06F12/084 Y02D10/13

    摘要: A memory is used by concurrent threads in a multithreaded processor. Any addressable storage location is accessible by any of the concurrent threads, but only one location at a time is accessible. The memory is coupled to parallel processing engines that generate a group of parallel memory access requests, each specifying a target address that might be the same or different for different requests. Serialization logic selects one of the target addresses and determines which of the requests specify the selected target address. All such requests are allowed to proceed in parallel, while other requests are deferred. Deferred requests may be regenerated and processed through the serialization logic so that a group of requests can be satisfied by accessing each different target address in the group exactly once.

    摘要翻译: 多线程处理器中的并发线程使用内存。 任何可寻址的存储位置都可以由任何并发线程访问,但一次只能访问一个位置。 存储器耦合到并行处理引擎,其产生一组并行存储器访问请求,每个指定对于不同请求可能相同或不同的目标地址。 序列化逻辑选择一个目标地址,并确定哪个请求指定所选择的目标地址。 允许所有这些请求并行进行,而其他请求被推迟。 可以通过序列化逻辑重新生成和处理延迟请求,以便通过一次访问组中的每个不同的目标地址来满足一组请求。