Method and system for connecting multiple shaders
    1.
    发明授权
    Method and system for connecting multiple shaders 有权
    连接多个着色器的方法和系统

    公开(公告)号:US08223158B1

    公开(公告)日:2012-07-17

    申请号:US11613018

    申请日:2006-12-19

    IPC分类号: G06T1/20

    CPC分类号: G06T1/20

    摘要: A method and system for connecting multiple shaders are disclosed. Specifically, one embodiment of the present invention sets forth a method, which includes the steps of configuring a set of shaders in a user-defined sequence within a modular pipeline (MPipe), allocating resources to execute the programming instructions of each of the set of shaders in the user-defined sequence to operate on the data unit, and directing the output of the MPipe to an external sink.

    摘要翻译: 公开了一种用于连接多个着色器的方法和系统。 具体地,本发明的一个实施例提出了一种方法,其包括以下步骤:在模块化流水线(MPipe)内以用户定义的序列配置一组着色器,分配资源以执行所述一组 用户定义的序列中的着色器在数据单元上操作,并将MPipe的输出引导到外部接收器。

    Reordering operands assigned to each one of read request ports concurrently accessing multibank register file to avoid bank conflict
    2.
    发明授权
    Reordering operands assigned to each one of read request ports concurrently accessing multibank register file to avoid bank conflict 有权
    对分配给每个读取请求端口的操作数重新排序并发访问多银行寄存器文件以避免银行冲突

    公开(公告)号:US08533435B2

    公开(公告)日:2013-09-10

    申请号:US12875843

    申请日:2010-09-03

    IPC分类号: G06F9/34

    摘要: One embodiment of the present invention sets forth a technique for collecting operands specified by an instruction. As a sequence of instructions is received the operands specified by the instructions are assigned to ports, so that each one of the operands specified by a single instruction is assigned to a different port. Reading of the operands from a multi-bank register file is scheduled by selecting an operand from each one of the different ports to produce an operand read request and ensuring that two or more of the selected operands are not stored in the same bank of the multi-bank register file. The operands specified by the operand read request are read from the multi-bank register file in a single clock cycle. Each instruction is then executed as the operands specified by the instruction are read from the multi-bank register file and collected over one or more clock cycles.

    摘要翻译: 本发明的一个实施例提出了一种用于收集由指令指定的操作数的技术。 由于接收到指令序列,指令指定的操作数被分配给端口,以便将由单个指令指定的每个操作数分配给不同的端口。 通过从不同端口中的每一个选择一个操作数来调度来自多存储器寄存器文件的操作数,以产生操作数读取请求,并确保所选择的操作数中的两个或更多个不存储在多个存储区的同一个存储区中 银行寄存器文件。 由操作数读取请求指定的操作数在单个时钟周期内从多存储体寄存器文件读取。 然后由指令指定的操作数从多存储寄存器文件中读取并在一个或多个时钟周期内采集,执行每条指令。

    Unified Collector Structure for Multi-Bank Register File
    3.
    发明申请
    Unified Collector Structure for Multi-Bank Register File 有权
    多银行登记册统一采集器结构

    公开(公告)号:US20110072243A1

    公开(公告)日:2011-03-24

    申请号:US12875843

    申请日:2010-09-03

    IPC分类号: G06F9/30

    摘要: One embodiment of the present invention sets forth a technique for collecting operands specified by an instruction. As a sequence of instructions is received the operands specified by the instructions are assigned to ports, so that each one of the operands specified by a single instruction is assigned to a different port. Reading of the operands from a multi-bank register file is scheduled by selecting an operand from each one of the different ports to produce an operand read request and ensuring that two or more of the selected operands are not stored in the same bank of the multi-bank register file. The operands specified by the operand read request are read from the multi-bank register file in a single clock cycle. Each instruction is then executed as the operands specified by the instruction are read from the multi-bank register file and collected over one or more clock cycles.

    摘要翻译: 本发明的一个实施例提出了一种用于收集由指令指定的操作数的技术。 由于接收到指令序列,指令指定的操作数被分配给端口,以便将由单个指令指定的每个操作数分配给不同的端口。 通过从不同端口中的每一个选择一个操作数来调度来自多存储器寄存器文件的操作数,以产生操作数读取请求,并确保所选择的操作数中的两个或更多个不存储在多个存储区的同一个存储区中 银行寄存器文件。 由操作数读取请求指定的操作数在单个时钟周期内从多存储体寄存器文件读取。 然后由指令指定的操作数从多存储寄存器文件中读取并在一个或多个时钟周期内采集,执行每条指令。

    Coalescing memory barrier operations across multiple parallel threads
    4.
    发明授权
    Coalescing memory barrier operations across multiple parallel threads 有权
    在多个并行线程之间合并记忆障碍操作

    公开(公告)号:US09223578B2

    公开(公告)日:2015-12-29

    申请号:US12887081

    申请日:2010-09-21

    IPC分类号: G06F9/46 G06F9/38 G06F9/30

    摘要: One embodiment of the present invention sets forth a technique for coalescing memory barrier operations across multiple parallel threads. Memory barrier requests from a given parallel thread processing unit are coalesced to reduce the impact to the rest of the system. Additionally, memory barrier requests may specify a level of a set of threads with respect to which the memory transactions are committed. For example, a first type of memory barrier instruction may commit the memory transactions to a level of a set of cooperating threads that share an L1 (level one) cache. A second type of memory barrier instruction may commit the memory transactions to a level of a set of threads sharing a global memory. Finally, a third type of memory barrier instruction may commit the memory transactions to a system level of all threads sharing all system memories. The latency required to execute the memory barrier instruction varies based on the type of memory barrier instruction.

    摘要翻译: 本发明的一个实施例提出了一种用于在多个并行线程之间聚合存储器屏障操作的技术。 来自给定并行线程处理单元的存储器屏障请求被合并以减少对系统其余部分的影响。 此外,存储器屏障请求可以指定针对其提交内存事务的一组线程的级别。 例如,第一类型的存储器障碍指令可以将存储器事务提交到共享L1(一级)高速缓存的一组协作线程的级别。 第二种类型的存储器障碍指令可以将存储器事务提交到共享全局存储器的一组线程的级别。 最后,第三种类型的存储器障碍指令可以将存储器事务提交到共享所有系统存储器的所有线程的系统级。 执行存储器屏障指令所需的延迟基于存储器屏障指令的类型而变化。

    Distributed stream output in a parallel processing unit
    5.
    发明授权
    Distributed stream output in a parallel processing unit 有权
    分布式流输出并行处理单元

    公开(公告)号:US08817031B2

    公开(公告)日:2014-08-26

    申请号:US12894001

    申请日:2010-09-29

    IPC分类号: G06F15/80

    CPC分类号: G06T1/00

    摘要: A technique for performing stream output operations in a parallel processing system is disclosed. A stream synchronization unit is provided that enables the parallel processing unit to track batches of vertices being processed in a graphics processing pipeline. A plurality of stream output units is also provided, where each stream output unit writes vertex attribute data to one or more stream output buffers for a portion of the batches of vertices. A messaging protocol is implemented between the stream synchronization unit and the plurality of stream output units that ensures that each of the stream output units writes vertex attribute data for the particular batch of vertices distributed to that particular stream output unit in the same order in the stream output buffers as the order in which the batch of vertices was received from a device driver by the parallel processing unit.

    摘要翻译: 公开了一种用于在并行处理系统中执行流输出操作的技术。 提供流同步单元,其使并行处理单元能够跟踪在图形处理流水线中正在处理的顶点的批次。 还提供了多个流输出单元,其中每个流输出单元将顶点属性数据写入一批或多个顶点的一部分的流输出缓冲器。 在流同步单元和多个流输出单元之间实现消息传递协议,确保每个流输出单元以流中相同的顺序写入分配给该特定流输出单元的特定批次的顶点的顶点属性数据 输出缓冲器作为由并行处理单元从设备驱动器接收到顶点批次的顺序。

    GRID WALK SAMPLING
    6.
    发明申请
    GRID WALK SAMPLING 审中-公开
    网路采样

    公开(公告)号:US20120280992A1

    公开(公告)日:2012-11-08

    申请号:US13461666

    申请日:2012-05-01

    IPC分类号: G06T17/00

    CPC分类号: G06T11/40

    摘要: The grid walk sampling technique is an efficient sampling algorithm aimed at optimizing the cost of triangle rasterization for modern graphics workloads. Grid walk sampling is an iterative rasterization algorithm that intelligently tests the intersection of triangle edges with multi-cell grids, determining coverage for a grid cell while identifying other cells in the grid that are either fully covered or fully uncovered by the triangle. Grid walk sampling rasterizes triangles using fewer computations and simpler computations compared with conventional highly parallel rasterizers. Therefore, a rasterizer employing grid walk sampling may compute sample coverage of triangles more efficiently in terms of power and circuitry die area compared with conventional highly parallel rasterizers.

    摘要翻译: 网格行走采样技术是一种高效的采样算法,旨在优化现代图形工作负载的三角形光栅化成本。 网格行走采样是一种迭代光栅化算法,它可以智能地测试三角形边缘与多单元格网格的交点,确定网格单元格的覆盖范围,同时识别网格中由三角形完全覆盖或完全未覆盖的其他单元格。 与传统的高度平行光栅化器相比,栅格行走采样使用更少的计算和更简单的计算来对三角形进行光栅化。 因此,与传统的高度平行光栅化器相比,使用栅格行走采样的光栅化器可以在功率和电路裸片面积方面更有效地计算三角形的样本覆盖。

    Context switching using halt sequencing protocol
    7.
    发明授权
    Context switching using halt sequencing protocol 有权
    使用停止排序协议进行上下文切换

    公开(公告)号:US07512773B1

    公开(公告)日:2009-03-31

    申请号:US11252855

    申请日:2005-10-18

    IPC分类号: G06F9/46

    CPC分类号: G06F9/485 G06F9/4881

    摘要: A halt sequencing protocol permits a context switch to occur in a processing pipeline even before all units of the processing pipeline are idle. The context switch method based on the halt sequencing protocol includes the steps of issuing a halt request signal to the units of a processing pipeline, monitoring the status of each of the units, and freezing the states of all of the units when they are either idle or halted. Then, the states of the units, which pertain to the thread that has been halted, are dumped into memory, and the units are restored with states corresponding to a different thread that is to be executed after the context switch.

    摘要翻译: 即使在处理流水线的所有单元都空闲之前,停止排序协议也允许在处理流水线中进行上下文切换。 基于暂停排序协议的上下文切换方法包括以下步骤:向处理流水线的单元发出停止请求信号,监视每个单元的状态,以及在空闲时冻结所有单元的状态 或停止。 然后,与暂停的线程相关的单元的状态被转储到存储器中,并且单元被恢复为与上下文切换之后要执行的不同线程相对应的状态。

    Superscalar processor with multiple register windows and speculative
return address generation
    8.
    发明授权
    Superscalar processor with multiple register windows and speculative return address generation 失效
    具有多个寄存器窗口和推测返回地址生成的超标量处理器

    公开(公告)号:US5896528A

    公开(公告)日:1999-04-20

    申请号:US522845

    申请日:1995-09-01

    IPC分类号: G06F9/32 G06F9/38 G06F9/42

    摘要: A superscaler processor capable of executing multiple instructions concurrently. The processor includes a program counter which identifies instructions for execution by multiple execution units. Further included is a register file made up of multiple register window pointer selects one of the multiple register windows. In response to the value of the current window pointer, a return prediction table provides a speculative program counter value, indicative of a return address of an instruction for a subroutine, corresponding to the selected register window. A watchpoint register stores the speculative program counter value. A fetch program counter, in response to the speculative program counter value, stores the instructions for execution after they have been identified by the program counter.

    摘要翻译: 能够同时执行多个指令的超标量处理器。 该处理器包括一个程序计数器,用于识别由多个执行单元执行的指令。 另外包括由多个寄存器窗口指针组成的寄存器文件,用于选择多个寄存器窗口之一。 响应于当前窗口指针的值,返回预测表提供与所选择的寄存器窗口相对应的指示子程序的指令的返回地址的推测程序计数器值。 观察点寄存器存储推测程序计数器值。 获取程序计数器响应于推测程序计数器值,在由程序计数器识别之后存储用于执行的指令。

    Programmable instruction trap system and method
    9.
    发明授权
    Programmable instruction trap system and method 失效
    可编程指令陷阱系统和方法

    公开(公告)号:US5896526A

    公开(公告)日:1999-04-20

    申请号:US25511

    申请日:1998-02-18

    CPC分类号: G06F11/3648

    摘要: A system and method providing a programmable hardware device within a CPU. The programmable hardware device permits a plurality of instructions to be trapped before they are executed. The instructions that are to be trapped are programmable to provide flexibility during CPU debugging and to ensure that a variety of application programs can be properly executed by the CPU. The system must also provide a means for permitting a trapped instruction to be emulated and/or to be executed serially. Related Applications

    摘要翻译: 一种在CPU内提供可编程硬件设备的系统和方法。 可编程硬件设备允许在执行多个指令之前被捕获。 要被捕获的指令是可编程的,以便在CPU调试期间提供灵活性,并确保CPU可以正确执行各种应用程序。 该系统还必须提供一种允许被捕获的指令被仿真和/或串行执行的手段。 相关应用

    Processor structure and method for watchpoint of plural simultaneous
unresolved branch evaluation
    10.
    发明授权
    Processor structure and method for watchpoint of plural simultaneous unresolved branch evaluation 失效
    多个同时未解决的分支评估的观察点的处理器结构和方法

    公开(公告)号:US5655115A

    公开(公告)日:1997-08-05

    申请号:US482075

    申请日:1995-06-07

    IPC分类号: G06F9/312 G06F9/38 G06F11/14

    摘要: Time-out checkpoints are formed based on a predetermined time-out condition or interval since the last checkpoint was formed rather than forming a checkpoint to store current processor state based merely on decoded instruction attributes. Such time-out conditions may include the number of instructions issued or the number of clock cycles elapsed, for example. Time-out checkpointing limits the maximum number of instructions within a checkpoint boundary and bounds the time period for recovery from an exception condition. The processor can restore time-out based checkpointed state faster than an instruction decode based checkpoint technique in the event of an exception so long as the instruction window size is greater than the maximum number of instructions within a checkpoint boundary, and such method eliminates processor state restoration dependency on instruction window size. Time-out checkpoints may be implemented with conventional checkpoints, or in a novel logical and physical register rename map checkpointing technique. Timeout checkpoint formation may be used with conventional processor backup techniques as well as with a novel backtracking technique including processor backup and backstepping.

    摘要翻译: 基于预定的超时条件或时间间隔形成超时检查点,因为上一次检查点形成而不是形成检查点,以便仅仅依赖于解码的指令属性来存储当前的处理器状态。 例如,这种超时条件可以包括发出的指令的数量或经过的时钟周期的数量。 超时检查点限制检查点边界内的最大指令数,并限制从异常情况恢复的时间段。 只要指令窗口大小大于检查点边界内的指令的最大数量,处理器可以在发生异常的情况下比基于指令解码的检查点技术更快地恢复基于超时的检查点状态,并且这种方法消除了处理器状态 恢复依赖于指令窗口大小。 超时检查点可以用常规检查点或新颖的逻辑和物理寄存器重命名映射检查点技术来实现。 超时检查点的形成可以与常规的处理器备份技术一起使用,还可以使用包括处理器备份和后台步骤的新型回溯技术。