REORDERING BUFFER FOR MEMORY ACCESS LOCALITY
    2.
    发明申请
    REORDERING BUFFER FOR MEMORY ACCESS LOCALITY 有权
    用于存储访问本地的后台缓冲区

    公开(公告)号:US20140164743A1

    公开(公告)日:2014-06-12

    申请号:US13710004

    申请日:2012-12-10

    Abstract: Systems and methods for scheduling instructions for execution on a multi-core processor reorder the execution of different threads to ensure that instructions specified as having localized memory access behavior are executed over one or more sequential clock cycles to benefit from memory access locality. At compile time, code sequences including memory access instructions that may be localized are delineated into separate batches. A scheduling unit ensures that multiple parallel threads are processed over one or more sequential scheduling cycles to execute the batched instructions. The scheduling unit waits to schedule execution of instructions that are not included in the particular batch until execution of the batched instructions is done so that memory access locality is maintained for the particular batch. In between the separate batches, instructions that are not included in a batch are scheduled so that threads executing non-batched instructions are also processed and not starved.

    Abstract translation: 用于调度用于在多核处理器上执行的指令的系统和方法重新排序不同线程的执行,以确保指定为具有局部存储器访问行为的指令在一个或多个顺序时钟周期上执行以受益于存储器访问位置。 在编译时,包括可能本地化的存储器访问指令的代码序列被划分为单独的批处理。 调度单元确保通过一个或多个顺序调度周期处理多个并行线程以执行批量指令。 调度单元等待执行不包括在特定批中的指令,直到完成批处理指令的执行,以便为特定批次维护存储器访问位置。 在单独批次之间,调度不包含在批处理中的指令,以便执行非批处理指令的线程也被处理并且不会被饿死。

    UNIFIED CACHE FOR DIVERSE MEMORY TRAFFIC
    4.
    发明申请

    公开(公告)号:US20180322078A1

    公开(公告)日:2018-11-08

    申请号:US15716461

    申请日:2017-09-26

    Abstract: A unified cache subsystem includes a data memory configured as both a shared memory and a local cache memory. The unified cache subsystem processes different types of memory transactions using different data pathways. To process memory transactions that target shared memory, the unified cache subsystem includes a direct pathway to the data memory. To process memory transactions that do not target shared memory, the unified cache subsystem includes a tag processing pipeline configured to identify cache hits and cache misses. When the tag processing pipeline identifies a cache hit for a given memory transaction, the transaction is rerouted to the direct pathway to data memory. When the tag processing pipeline identifies a cache miss for a given memory transaction, the transaction is pushed into a first-in first-out (FIFO) until miss data is returned from external memory. The tag processing pipeline is also configured to process texture-oriented memory transactions.

    UNIFIED CACHE FOR DIVERSE MEMORY TRAFFIC
    5.
    发明申请

    公开(公告)号:US20180322077A1

    公开(公告)日:2018-11-08

    申请号:US15587213

    申请日:2017-05-04

    Abstract: A unified cache subsystem includes a data memory configured as both a shared memory and a local cache memory. The unified cache subsystem processes different types of memory transactions using different data pathways. To process memory transactions that target shared memory, the unified cache subsystem includes a direct pathway to the data memory. To process memory transactions that do not target shared memory, the unified cache subsystem includes a tag processing pipeline configured to identify cache hits and cache misses. When the tag processing pipeline identifies a cache hit for a given memory transaction, the transaction is rerouted to the direct pathway to data memory. When the tag processing pipeline identifies a cache miss for a given memory transaction, the transaction is pushed into a first-in first-out (FIFO) until miss data is returned from external memory. The tag processing pipeline is also configured to process texture-oriented memory transactions.

    HIERARCHICAL STAGING AREAS FOR SCHEDULING THREADS FOR EXECUTION
    6.
    发明申请
    HIERARCHICAL STAGING AREAS FOR SCHEDULING THREADS FOR EXECUTION 审中-公开
    用于调度执行螺纹的分级分区

    公开(公告)号:US20150113538A1

    公开(公告)日:2015-04-23

    申请号:US14061170

    申请日:2013-10-23

    CPC classification number: G06F9/5011 G06F2209/507

    Abstract: One embodiment of the present invention is a computer-implemented method for scheduling a thread group for execution on a processing engine that includes identifying a first thread group included in a first set of thread groups that can be issued for execution on the processing engine, where the first thread group includes one or more threads. The method also includes transferring the first thread group from the first set of thread groups to a second set of thread groups, allocating hardware resources to the first thread group, and selecting the first thread group from the second set of thread groups for execution on the processing engine. One advantage of the disclosed technique is that a scheduler only allocates limited hardware resources to thread groups that are, in fact, ready to be issued for execution, thereby conserving those resources in a manner that is generally more efficient than conventional techniques.

    Abstract translation: 本发明的一个实施例是一种用于在处理引擎上调度用于执行的线程组的计算机实现的方法,该处理引擎包括识别包括在可被发行用于在处理引擎上执行的第一组线程组中的第一线程组,其中 第一个线程组包括一个或多个线程。 该方法还包括将第一线程组从第一组线程组传送到第二组线程组,向第一线程组分配硬件资源,以及从第二组线程组中选择第一线程组以在 处理引擎。 所公开技术的一个优点是调度器仅将有限的硬件资源分配给事实上准备被发行用于执行的线程组,从而以通常比常规技术更有效的方式来保存那些资源。

Patent Agency Ranking