Cache miss processing using a defer/replay mechanism
    1.
    发明授权
    Cache miss processing using a defer/replay mechanism 有权
    使用延迟/重播机制的缓存未命中处理

    公开(公告)号:US08266383B1

    公开(公告)日:2012-09-11

    申请号:US12650189

    申请日:2009-12-30

    CPC分类号: G06F12/0859 G06F12/084

    摘要: One embodiment of the present invention sets forth a technique for processing cache misses resulting from a request received from one of the multiple clients of an L1 cache. The L1 cache services multiple clients with diverse latency and bandwidth requirements, including at least one client whose requests cannot be stalled. The L1 cache includes storage to buffer pending requests for caches misses. When an entry is available to store a pending request, a request causing a cache miss is accepted. When the data for a read request becomes available, the cache instructs the client to resubmit the read request to receive the data. When an entry is not available to store a pending request, a request causing a cache miss is deferred and the cache provides the client with status information that is used to determine when the request should be resubmitted.

    摘要翻译: 本发明的一个实施例提出了一种用于处理由从L1高速缓存的多个客户端之一接收到的请求而产生的高速缓存未命中的技术。 L1缓存服务于具有不同延迟和带宽需求的多个客户端,包括至少一个客户端,其请求不能停止。 L1高速缓存包括缓存未缓存缓存请求的存储。 当条目可用于存储挂起的请求时,接受导致高速缓存未命中的请求。 当读请求的数据变得可用时,缓存指示客户端重新提交读请求以接收数据。 当条目不可用于存储挂起的请求时,导致高速缓存未命中的请求被延迟,并且高速缓存为客户端提供用于确定何时应该重新提交请求的状态信息。

    Reordering operands assigned to each one of read request ports concurrently accessing multibank register file to avoid bank conflict
    2.
    发明授权
    Reordering operands assigned to each one of read request ports concurrently accessing multibank register file to avoid bank conflict 有权
    对分配给每个读取请求端口的操作数重新排序并发访问多银行寄存器文件以避免银行冲突

    公开(公告)号:US08533435B2

    公开(公告)日:2013-09-10

    申请号:US12875843

    申请日:2010-09-03

    IPC分类号: G06F9/34

    摘要: One embodiment of the present invention sets forth a technique for collecting operands specified by an instruction. As a sequence of instructions is received the operands specified by the instructions are assigned to ports, so that each one of the operands specified by a single instruction is assigned to a different port. Reading of the operands from a multi-bank register file is scheduled by selecting an operand from each one of the different ports to produce an operand read request and ensuring that two or more of the selected operands are not stored in the same bank of the multi-bank register file. The operands specified by the operand read request are read from the multi-bank register file in a single clock cycle. Each instruction is then executed as the operands specified by the instruction are read from the multi-bank register file and collected over one or more clock cycles.

    摘要翻译: 本发明的一个实施例提出了一种用于收集由指令指定的操作数的技术。 由于接收到指令序列,指令指定的操作数被分配给端口,以便将由单个指令指定的每个操作数分配给不同的端口。 通过从不同端口中的每一个选择一个操作数来调度来自多存储器寄存器文件的操作数,以产生操作数读取请求,并确保所选择的操作数中的两个或更多个不存储在多个存储区的同一个存储区中 银行寄存器文件。 由操作数读取请求指定的操作数在单个时钟周期内从多存储体寄存器文件读取。 然后由指令指定的操作数从多存储寄存器文件中读取并在一个或多个时钟周期内采集,执行每条指令。

    Unified Collector Structure for Multi-Bank Register File
    3.
    发明申请
    Unified Collector Structure for Multi-Bank Register File 有权
    多银行登记册统一采集器结构

    公开(公告)号:US20110072243A1

    公开(公告)日:2011-03-24

    申请号:US12875843

    申请日:2010-09-03

    IPC分类号: G06F9/30

    摘要: One embodiment of the present invention sets forth a technique for collecting operands specified by an instruction. As a sequence of instructions is received the operands specified by the instructions are assigned to ports, so that each one of the operands specified by a single instruction is assigned to a different port. Reading of the operands from a multi-bank register file is scheduled by selecting an operand from each one of the different ports to produce an operand read request and ensuring that two or more of the selected operands are not stored in the same bank of the multi-bank register file. The operands specified by the operand read request are read from the multi-bank register file in a single clock cycle. Each instruction is then executed as the operands specified by the instruction are read from the multi-bank register file and collected over one or more clock cycles.

    摘要翻译: 本发明的一个实施例提出了一种用于收集由指令指定的操作数的技术。 由于接收到指令序列,指令指定的操作数被分配给端口,以便将由单个指令指定的每个操作数分配给不同的端口。 通过从不同端口中的每一个选择一个操作数来调度来自多存储器寄存器文件的操作数,以产生操作数读取请求,并确保所选择的操作数中的两个或更多个不存储在多个存储区的同一个存储区中 银行寄存器文件。 由操作数读取请求指定的操作数在单个时钟周期内从多存储体寄存器文件读取。 然后由指令指定的操作数从多存储寄存器文件中读取并在一个或多个时钟周期内采集,执行每条指令。

    Opcode-specified predicatable warp post-synchronization
    6.
    发明授权
    Opcode-specified predicatable warp post-synchronization 有权
    操作码指定的可预测扭曲后同步

    公开(公告)号:US08850436B2

    公开(公告)日:2014-09-30

    申请号:US12892887

    申请日:2010-09-28

    IPC分类号: G06F9/46 G06F9/38 G06F9/30

    摘要: One embodiment of the present invention sets forth a technique for performing a method for synchronizing divergent executing threads. The method includes receiving a plurality of instructions that includes at least one set-synchronization instruction and at least one instruction that includes a synchronization command, and determining an active mask that indicates which threads in a plurality of threads are active and which threads in the plurality of threads are disabled. For each instruction included in the plurality of instructions, the instruction is transmitted to each of the active threads included in the plurality of threads. If the instruction is a set-synchronization instruction, then a synchronization token, the active mask and the synchronization point is each pushed onto a stack. Or, if the instruction is a predicated instruction that includes a synchronization command, then each active thread that executes the predicated instruction is monitored to determine when the active mask has been updated to indicate that each active thread, after executing the predicated instruction, has been disabled.

    摘要翻译: 本发明的一个实施例提出了一种用于执行用于同步发散执行线程的方法的技术。 该方法包括接收包括至少一个集合同步指令和包括同步命令的至少一个指令的多个指令,以及确定指示多个线程中的哪些线程是活动的活动掩码,以及多个线程中的哪些线程 的线程被禁用。 对于包括在多个指令中的每个指令,指令被发送到包括在多个线程中的每个活动线程。 如果指令是设置同步指令,则将同步令牌,活动掩码和同步点分别压入堆栈。 或者,如果指令是包括同步命令的预测指令,则监视执行预测指令的每个活动线程,以确定何时更新活动掩码以指示在执行预定指令之后每个活动线程已被 残疾人士

    Programmable graphics processor for multithreaded execution of programs
    7.
    发明授权
    Programmable graphics processor for multithreaded execution of programs 有权
    用于多线程执行程序的可编程图形处理器

    公开(公告)号:US08405665B2

    公开(公告)日:2013-03-26

    申请号:US13466043

    申请日:2012-05-07

    CPC分类号: G06T15/005

    摘要: A processing unit includes multiple execution pipelines, each of which is coupled to a first input section for receiving input data for pixel processing and a second input section for receiving input data for vertex processing and to a first output section for storing processed pixel data and a second output section for storing processed vertex data. The processed vertex data is rasterized and scan converted into pixel data that is used as the input data for pixel processing. The processed pixel data is output to a raster analyzer.

    摘要翻译: 处理单元包括多个执行流水线,每个执行流水线连接到第一输入部分,用于接收用于像素处理的输入数据和用于接收用于顶点处理的输入数据的第二输入部分和用于存储经处理的像素数据的第一输出部分和 用于存储经处理的顶点数据的第二输出部分。 经处理的顶点数据被光栅化并扫描转换为用作像素处理的输入数据的像素数据。 经处理的像素数据被输出到光栅分析器。

    SHARED SINGLE-ACCESS MEMORY WITH MANAGEMENT OF MULTIPLE PARALLEL REQUESTS
    8.
    发明申请
    SHARED SINGLE-ACCESS MEMORY WITH MANAGEMENT OF MULTIPLE PARALLEL REQUESTS 有权
    具有多个并行请求管理的共享单访存储器

    公开(公告)号:US20120221808A1

    公开(公告)日:2012-08-30

    申请号:US13466057

    申请日:2012-05-07

    IPC分类号: G06F12/00

    CPC分类号: G06F12/084 Y02D10/13

    摘要: A memory is used by concurrent threads in a multithreaded processor. Any addressable storage location is accessible by any of the concurrent threads, but only one location at a time is accessible. The memory is coupled to parallel processing engines that generate a group of parallel memory access requests, each specifying a target address that might be the same or different for different requests. Serialization logic selects one of the target addresses and determines which of the requests specify the selected target address. All such requests are allowed to proceed in parallel, while other requests are deferred. Deferred requests may be regenerated and processed through the serialization logic so that a group of requests can be satisfied by accessing each different target address in the group exactly once.

    摘要翻译: 多线程处理器中的并发线程使用内存。 任何可寻址的存储位置都可以由任何并发线程访问,但一次只能访问一个位置。 存储器耦合到并行处理引擎,其产生一组并行存储器访问请求,每个指定对于不同请求可能相同或不同的目标地址。 序列化逻辑选择一个目标地址,并确定哪个请求指定所选择的目标地址。 允许所有这些请求并行进行,而其他请求被推迟。 可以通过序列化逻辑重新生成和处理延迟请求,以便通过一次访问组中的每个不同的目标地址来满足一组请求。

    Scoreboard having size indicators for tracking sequential destination register usage in a multi-threaded processor
    9.
    发明授权
    Scoreboard having size indicators for tracking sequential destination register usage in a multi-threaded processor 有权
    记分牌具有用于跟踪多线程处理器中的顺序目的地寄存器使用的大小指示符

    公开(公告)号:US08225076B1

    公开(公告)日:2012-07-17

    申请号:US12233515

    申请日:2008-09-18

    IPC分类号: G06F9/30

    摘要: A scoreboard memory for a processing unit has separate memory regions allocated to each of the multiple threads to be processed. For each thread, the scoreboard memory stores register identifiers of registers that have pending writes. When an instruction is added to an instruction buffer, the register identifiers of the registers specified in the instruction are compared with the register identifiers stored in the scoreboard memory for that instruction's thread, and a multi-bit value representing the comparison result is generated. The multi-bit value is stored with the instruction in the instruction buffer and may be updated as instructions belonging to the same thread complete their execution. Before the instruction is issued for execution, this multi-bit value is checked. If this multi-bit value indicates that none of the registers specified in the instruction have pending writes, the instruction is allowed to issue for execution.

    摘要翻译: 用于处理单元的记分板存储器具有分配给要处理的多个线程中的每一个的分离的存储器区域。 对于每个线程,记分板存储器存储具有待处理写入的寄存器的寄存器标识符。 当指令被添加到指令缓冲器中时,将指令中指定的寄存器的寄存器标识符与存储在该指令的线程的记分板存储器中的寄存器标识进行比较,并生成表示比较结果的多位值。 多位值与指令一起存储在指令缓冲器中,并且可以更新为属于同一线程的指令完成其执行。 在执行指令之前,将检查该多位值。 如果该多位值表示指令中没有指定的寄存器没有挂起写操作,则允许指令执行。

    Shared single-access memory with management of multiple parallel requests
    10.
    发明授权
    Shared single-access memory with management of multiple parallel requests 有权
    具有管理多个并行请求的共享单访问存储器

    公开(公告)号:US08176265B2

    公开(公告)日:2012-05-08

    申请号:US13165638

    申请日:2011-06-21

    IPC分类号: G06F12/00

    CPC分类号: G06F12/084 Y02D10/13

    摘要: A memory is used by concurrent threads in a multithreaded processor. Any addressable storage location is accessible by any of the concurrent threads, but only one location at a time is accessible. The memory is coupled to parallel processing engines that generate a group of parallel memory access requests, each specifying a target address that might be the same or different for different requests. Serialization logic selects one of the target addresses and determines which of the requests specify the selected target address. All such requests are allowed to proceed in parallel, while other requests are deferred. Deferred requests may be regenerated and processed through the serialization logic so that a group of requests can be satisfied by accessing each different target address in the group exactly once.

    摘要翻译: 多线程处理器中的并发线程使用内存。 任何可寻址的存储位置都可以由任何并发线程访问,但一次只能访问一个位置。 存储器耦合到并行处理引擎,其产生一组并行存储器访问请求,每个指定对于不同请求可能相同或不同的目标地址。 序列化逻辑选择一个目标地址,并确定哪个请求指定所选择的目标地址。 允许所有这些请求并行进行,而其他请求被推迟。 可以通过序列化逻辑重新生成和处理延迟请求,以便通过一次访问组中的每个不同的目标地址来满足一组请求。