Halt context switching method and system
    1.
    发明授权
    Halt context switching method and system 有权
    停止上下文切换方法和系统

    公开(公告)号:US07916146B1

    公开(公告)日:2011-03-29

    申请号:US11292471

    申请日:2005-12-02

    IPC分类号: G06T1/20

    CPC分类号: G06F9/461 G06T1/20

    摘要: In a processing pipeline having a plurality of units, an interface unit is provided between a first, upstream pipeline unit that needs to be drained prior to a context switch and a second, downstream pipeline unit that might halt prior to a context switch. The interface unit redirects data that are drained from the first pipeline unit and to be received by the second pipeline unit, to a buffer memory provided in the front end of the processing pipeline. The contents of the buffer memory are subsequently dumped into memory reserved for the context that is being stored. When the processing pipeline is restored with this context, the data that were dumped into memory are retrieved back into the buffer memory and provided to the interface unit. The interface unit receives these commands and directs them to the second pipeline unit.

    摘要翻译: 在具有多个单元的处理流水线中,在上下文切换之前需要排出的第一上游流水线单元和在上下文切换之前可能停止的第二下游流水线单元之间提供接口单元。 接口单元将从第一流水线单元排出并被第二流水线单元接收的数据重定向到设置在处理流水线前端的缓冲存储器。 随后将缓冲存储器的内容转储到为正被存储的上下文保留的存储器中。 当使用该上下文恢复处理流水线时,转储到存储器中的数据被返回到缓冲存储器中并提供给接口单元。 接口单元接收这些命令并将它们引导到第二管道单元。

    Context switching using halt sequencing protocol
    2.
    发明授权
    Context switching using halt sequencing protocol 有权
    使用停止排序协议进行上下文切换

    公开(公告)号:US07512773B1

    公开(公告)日:2009-03-31

    申请号:US11252855

    申请日:2005-10-18

    IPC分类号: G06F9/46

    CPC分类号: G06F9/485 G06F9/4881

    摘要: A halt sequencing protocol permits a context switch to occur in a processing pipeline even before all units of the processing pipeline are idle. The context switch method based on the halt sequencing protocol includes the steps of issuing a halt request signal to the units of a processing pipeline, monitoring the status of each of the units, and freezing the states of all of the units when they are either idle or halted. Then, the states of the units, which pertain to the thread that has been halted, are dumped into memory, and the units are restored with states corresponding to a different thread that is to be executed after the context switch.

    摘要翻译: 即使在处理流水线的所有单元都空闲之前,停止排序协议也允许在处理流水线中进行上下文切换。 基于暂停排序协议的上下文切换方法包括以下步骤:向处理流水线的单元发出停止请求信号,监视每个单元的状态,以及在空闲时冻结所有单元的状态 或停止。 然后,与暂停的线程相关的单元的状态被转储到存储器中,并且单元被恢复为与上下文切换之后要执行的不同线程相对应的状态。

    Hardware warning protocol for processing units
    3.
    发明授权
    Hardware warning protocol for processing units 有权
    处理单元的硬件警告协议

    公开(公告)号:US08127181B1

    公开(公告)日:2012-02-28

    申请号:US11934732

    申请日:2007-11-02

    IPC分类号: G06F11/00 G06F11/36

    摘要: Processing units are configured to capture the unit state in unit level error status registers when a runtime error event is detected in order to facilitate debugging of runtime errors. The reporting of warnings may be disabled or enabled to selectively monitor each processing unit. Warnings for each processing unit are propagated to an exception register in a front end monitoring unit. The warnings are then aggregated and propagated to an interrupt register in a front end monitoring unit in order to selectively generate an interrupt and facilitate debugging. A debugging application may be used to query the interrupt, exception, and unit level error status registers to determine the cause of the error. A default error handling behavior that overrides error conditions may be used in conjunction with the hardware warning protocol to allow the processing units to continue operating and facilitate in the debug of runtime errors.

    摘要翻译: 处理单元配置为在检测到运行时错误事件时捕获单元级错误状态寄存器中的单元状态,以便于调试运行时错误。 可以禁用或启用警告报告,以选择性地监视每个处理单元。 每个处理单元的警告传播到前端监视单元中的异常寄存器。 然后将警告聚合并传播到前端监视单元中的中断寄存器,以便选择性地产生中断并便于调试。 调试应用程序可用于查询中断,异常和单元级错误状态寄存器,以确定错误的原因。 超出错误条件的默认错误处理行为可能与硬件警告协议一起使用,以允许处理单元继续运行,并有助于调试运行时错误。

    Unit status reporting protocol
    4.
    发明授权
    Unit status reporting protocol 有权
    单位状态报告协议

    公开(公告)号:US08019978B1

    公开(公告)日:2011-09-13

    申请号:US11837933

    申请日:2007-08-13

    IPC分类号: G06F9/30 G06F9/00

    摘要: A unit status reporting protocol may also be used for context switching, debugging, and removing deadlock conditions in a processing unit. A processing unit is in one of five states: empty, active, stalled, quiescent, and halted. The state that a processing unit is in is reported to a front end monitoring unit to enable the front end monitoring unit to determine when a context switch may be performed or when a deadlock condition exists. The front end monitoring unit can issue a halt command to perform a context switch or take action to remove a deadlock condition and allow processing to resume.

    摘要翻译: 单元状态报告协议也可用于上下文切换,调试和去除处理单元中的死锁条件。 处理单元处于五个状态之一:空,活动,停滞,静止和停止。 将处理单元所在的状态报告给前端监视单元,以使前端监视单元能够确定何时可以执行上下文切换或何时存在死锁条件。 前端监控单元可以发出一个停止命令来执行上下文切换或采取行动去除死锁状态并允许处理恢复。

    Coalescing memory barrier operations across multiple parallel threads
    5.
    发明授权
    Coalescing memory barrier operations across multiple parallel threads 有权
    在多个并行线程之间合并记忆障碍操作

    公开(公告)号:US09223578B2

    公开(公告)日:2015-12-29

    申请号:US12887081

    申请日:2010-09-21

    IPC分类号: G06F9/46 G06F9/38 G06F9/30

    摘要: One embodiment of the present invention sets forth a technique for coalescing memory barrier operations across multiple parallel threads. Memory barrier requests from a given parallel thread processing unit are coalesced to reduce the impact to the rest of the system. Additionally, memory barrier requests may specify a level of a set of threads with respect to which the memory transactions are committed. For example, a first type of memory barrier instruction may commit the memory transactions to a level of a set of cooperating threads that share an L1 (level one) cache. A second type of memory barrier instruction may commit the memory transactions to a level of a set of threads sharing a global memory. Finally, a third type of memory barrier instruction may commit the memory transactions to a system level of all threads sharing all system memories. The latency required to execute the memory barrier instruction varies based on the type of memory barrier instruction.

    摘要翻译: 本发明的一个实施例提出了一种用于在多个并行线程之间聚合存储器屏障操作的技术。 来自给定并行线程处理单元的存储器屏障请求被合并以减少对系统其余部分的影响。 此外,存储器屏障请求可以指定针对其提交内存事务的一组线程的级别。 例如,第一类型的存储器障碍指令可以将存储器事务提交到共享L1(一级)高速缓存的一组协作线程的级别。 第二种类型的存储器障碍指令可以将存储器事务提交到共享全局存储器的一组线程的级别。 最后,第三种类型的存储器障碍指令可以将存储器事务提交到共享所有系统存储器的所有线程的系统级。 执行存储器屏障指令所需的延迟基于存储器屏障指令的类型而变化。

    Distributed stream output in a parallel processing unit
    6.
    发明授权
    Distributed stream output in a parallel processing unit 有权
    分布式流输出并行处理单元

    公开(公告)号:US08817031B2

    公开(公告)日:2014-08-26

    申请号:US12894001

    申请日:2010-09-29

    IPC分类号: G06F15/80

    CPC分类号: G06T1/00

    摘要: A technique for performing stream output operations in a parallel processing system is disclosed. A stream synchronization unit is provided that enables the parallel processing unit to track batches of vertices being processed in a graphics processing pipeline. A plurality of stream output units is also provided, where each stream output unit writes vertex attribute data to one or more stream output buffers for a portion of the batches of vertices. A messaging protocol is implemented between the stream synchronization unit and the plurality of stream output units that ensures that each of the stream output units writes vertex attribute data for the particular batch of vertices distributed to that particular stream output unit in the same order in the stream output buffers as the order in which the batch of vertices was received from a device driver by the parallel processing unit.

    摘要翻译: 公开了一种用于在并行处理系统中执行流输出操作的技术。 提供流同步单元,其使并行处理单元能够跟踪在图形处理流水线中正在处理的顶点的批次。 还提供了多个流输出单元,其中每个流输出单元将顶点属性数据写入一批或多个顶点的一部分的流输出缓冲器。 在流同步单元和多个流输出单元之间实现消息传递协议,确保每个流输出单元以流中相同的顺序写入分配给该特定流输出单元的特定批次的顶点的顶点属性数据 输出缓冲器作为由并行处理单元从设备驱动器接收到顶点批次的顺序。

    GRID WALK SAMPLING
    7.
    发明申请
    GRID WALK SAMPLING 审中-公开
    网路采样

    公开(公告)号:US20120280992A1

    公开(公告)日:2012-11-08

    申请号:US13461666

    申请日:2012-05-01

    IPC分类号: G06T17/00

    CPC分类号: G06T11/40

    摘要: The grid walk sampling technique is an efficient sampling algorithm aimed at optimizing the cost of triangle rasterization for modern graphics workloads. Grid walk sampling is an iterative rasterization algorithm that intelligently tests the intersection of triangle edges with multi-cell grids, determining coverage for a grid cell while identifying other cells in the grid that are either fully covered or fully uncovered by the triangle. Grid walk sampling rasterizes triangles using fewer computations and simpler computations compared with conventional highly parallel rasterizers. Therefore, a rasterizer employing grid walk sampling may compute sample coverage of triangles more efficiently in terms of power and circuitry die area compared with conventional highly parallel rasterizers.

    摘要翻译: 网格行走采样技术是一种高效的采样算法,旨在优化现代图形工作负载的三角形光栅化成本。 网格行走采样是一种迭代光栅化算法,它可以智能地测试三角形边缘与多单元格网格的交点,确定网格单元格的覆盖范围,同时识别网格中由三角形完全覆盖或完全未覆盖的其他单元格。 与传统的高度平行光栅化器相比,栅格行走采样使用更少的计算和更简单的计算来对三角形进行光栅化。 因此,与传统的高度平行光栅化器相比,使用栅格行走采样的光栅化器可以在功率和电路裸片面积方面更有效地计算三角形的样本覆盖。

    Superscalar processor with multiple register windows and speculative
return address generation
    8.
    发明授权
    Superscalar processor with multiple register windows and speculative return address generation 失效
    具有多个寄存器窗口和推测返回地址生成的超标量处理器

    公开(公告)号:US5896528A

    公开(公告)日:1999-04-20

    申请号:US522845

    申请日:1995-09-01

    IPC分类号: G06F9/32 G06F9/38 G06F9/42

    摘要: A superscaler processor capable of executing multiple instructions concurrently. The processor includes a program counter which identifies instructions for execution by multiple execution units. Further included is a register file made up of multiple register window pointer selects one of the multiple register windows. In response to the value of the current window pointer, a return prediction table provides a speculative program counter value, indicative of a return address of an instruction for a subroutine, corresponding to the selected register window. A watchpoint register stores the speculative program counter value. A fetch program counter, in response to the speculative program counter value, stores the instructions for execution after they have been identified by the program counter.

    摘要翻译: 能够同时执行多个指令的超标量处理器。 该处理器包括一个程序计数器,用于识别由多个执行单元执行的指令。 另外包括由多个寄存器窗口指针组成的寄存器文件,用于选择多个寄存器窗口之一。 响应于当前窗口指针的值,返回预测表提供与所选择的寄存器窗口相对应的指示子程序的指令的返回地址的推测程序计数器值。 观察点寄存器存储推测程序计数器值。 获取程序计数器响应于推测程序计数器值,在由程序计数器识别之后存储用于执行的指令。

    Programmable instruction trap system and method
    9.
    发明授权
    Programmable instruction trap system and method 失效
    可编程指令陷阱系统和方法

    公开(公告)号:US5896526A

    公开(公告)日:1999-04-20

    申请号:US25511

    申请日:1998-02-18

    CPC分类号: G06F11/3648

    摘要: A system and method providing a programmable hardware device within a CPU. The programmable hardware device permits a plurality of instructions to be trapped before they are executed. The instructions that are to be trapped are programmable to provide flexibility during CPU debugging and to ensure that a variety of application programs can be properly executed by the CPU. The system must also provide a means for permitting a trapped instruction to be emulated and/or to be executed serially. Related Applications

    摘要翻译: 一种在CPU内提供可编程硬件设备的系统和方法。 可编程硬件设备允许在执行多个指令之前被捕获。 要被捕获的指令是可编程的,以便在CPU调试期间提供灵活性,并确保CPU可以正确执行各种应用程序。 该系统还必须提供一种允许被捕获的指令被仿真和/或串行执行的手段。 相关应用

    Processor structure and method for watchpoint of plural simultaneous
unresolved branch evaluation
    10.
    发明授权
    Processor structure and method for watchpoint of plural simultaneous unresolved branch evaluation 失效
    多个同时未解决的分支评估的观察点的处理器结构和方法

    公开(公告)号:US5655115A

    公开(公告)日:1997-08-05

    申请号:US482075

    申请日:1995-06-07

    IPC分类号: G06F9/312 G06F9/38 G06F11/14

    摘要: Time-out checkpoints are formed based on a predetermined time-out condition or interval since the last checkpoint was formed rather than forming a checkpoint to store current processor state based merely on decoded instruction attributes. Such time-out conditions may include the number of instructions issued or the number of clock cycles elapsed, for example. Time-out checkpointing limits the maximum number of instructions within a checkpoint boundary and bounds the time period for recovery from an exception condition. The processor can restore time-out based checkpointed state faster than an instruction decode based checkpoint technique in the event of an exception so long as the instruction window size is greater than the maximum number of instructions within a checkpoint boundary, and such method eliminates processor state restoration dependency on instruction window size. Time-out checkpoints may be implemented with conventional checkpoints, or in a novel logical and physical register rename map checkpointing technique. Timeout checkpoint formation may be used with conventional processor backup techniques as well as with a novel backtracking technique including processor backup and backstepping.

    摘要翻译: 基于预定的超时条件或时间间隔形成超时检查点,因为上一次检查点形成而不是形成检查点,以便仅仅依赖于解码的指令属性来存储当前的处理器状态。 例如,这种超时条件可以包括发出的指令的数量或经过的时钟周期的数量。 超时检查点限制检查点边界内的最大指令数,并限制从异常情况恢复的时间段。 只要指令窗口大小大于检查点边界内的指令的最大数量,处理器可以在发生异常的情况下比基于指令解码的检查点技术更快地恢复基于超时的检查点状态,并且这种方法消除了处理器状态 恢复依赖于指令窗口大小。 超时检查点可以用常规检查点或新颖的逻辑和物理寄存器重命名映射检查点技术来实现。 超时检查点的形成可以与常规的处理器备份技术一起使用,还可以使用包括处理器备份和后台步骤的新型回溯技术。