Cache arbitration between multiple clients
    2.
    发明授权
    Cache arbitration between multiple clients 有权
    缓存多个客户端之间的仲裁

    公开(公告)号:US08335892B1

    公开(公告)日:2012-12-18

    申请号:US12650226

    申请日:2009-12-30

    IPC分类号: G06F12/00 G06F13/00 G06F13/28

    CPC分类号: G06F12/084 G06F12/0857

    摘要: One embodiment of the present invention sets forth a technique for arbitrating requests received by an L1 cache from multiple clients. The L1 cache outputs bubble requests to a first one of the multiple clients that cause the first one of the multiple clients to insert bubbles into the request stream, where a bubble is the absence of a request. The bubbles allow the L1 cache to grant access to another one of the multiple clients without stalling the first one of the multiple clients. The L1 cache services multiple clients with diverse latency and bandwidth requirements and may be reconfigured to provide memory spaces for clients executing multiple parallel threads, where the memory spaces each have a different scope.

    摘要翻译: 本发明的一个实施例提出了一种用于仲裁来自多个客户端的L1高速缓存的请求的技术。 L1缓存将气泡请求输出到多个客户端中的第一个客户端,导致多个客户端中的第一个客户端将气泡插入到请求流中,其中气泡不存在请求。 这些气泡允许L1高速缓存向多个客户机中的另一个客户端授予访问权限,而不会使多个客户端中的第一个客户端停顿。 L1缓存服务于具有不同延迟和带宽需求的多个客户端,并且可以被重新配置为为执行多个并行线程的客户端提供存储空间,其中每个存储空间具有不同的范围。

    Cache miss processing using a defer/replay mechanism
    5.
    发明授权
    Cache miss processing using a defer/replay mechanism 有权
    使用延迟/重播机制的缓存未命中处理

    公开(公告)号:US08266383B1

    公开(公告)日:2012-09-11

    申请号:US12650189

    申请日:2009-12-30

    CPC分类号: G06F12/0859 G06F12/084

    摘要: One embodiment of the present invention sets forth a technique for processing cache misses resulting from a request received from one of the multiple clients of an L1 cache. The L1 cache services multiple clients with diverse latency and bandwidth requirements, including at least one client whose requests cannot be stalled. The L1 cache includes storage to buffer pending requests for caches misses. When an entry is available to store a pending request, a request causing a cache miss is accepted. When the data for a read request becomes available, the cache instructs the client to resubmit the read request to receive the data. When an entry is not available to store a pending request, a request causing a cache miss is deferred and the cache provides the client with status information that is used to determine when the request should be resubmitted.

    摘要翻译: 本发明的一个实施例提出了一种用于处理由从L1高速缓存的多个客户端之一接收到的请求而产生的高速缓存未命中的技术。 L1缓存服务于具有不同延迟和带宽需求的多个客户端,包括至少一个客户端,其请求不能停止。 L1高速缓存包括缓存未缓存缓存请求的存储。 当条目可用于存储挂起的请求时,接受导致高速缓存未命中的请求。 当读请求的数据变得可用时,缓存指示客户端重新提交读请求以接收数据。 当条目不可用于存储挂起的请求时,导致高速缓存未命中的请求被延迟,并且高速缓存为客户端提供用于确定何时应该重新提交请求的状态信息。

    PRE-SCHEDULED REPLAYS OF DIVERGENT OPERATIONS
    6.
    发明申请
    PRE-SCHEDULED REPLAYS OF DIVERGENT OPERATIONS 审中-公开
    预先安排的重复操作

    公开(公告)号:US20130212364A1

    公开(公告)日:2013-08-15

    申请号:US13370173

    申请日:2012-02-09

    IPC分类号: G06F9/38 G06F9/312

    摘要: One embodiment of the present disclosure sets forth an optimized way to execute pre-scheduled replay operations for divergent operations in a parallel processing subsystem. Specifically, a streaming multiprocessor (SM) includes a multi-stage pipeline configured to insert pre-scheduled replay operations into a multi-stage pipeline. A pre-scheduled replay unit detects whether the operation associated with the current instruction is accessing a common resource. If the threads are accessing data which are distributed across multiple cache lines, then the pre-scheduled replay unit inserts pre-scheduled replay operations behind the current instruction. The multi-stage pipeline executes the instruction and the associated pre-scheduled replay operations sequentially. If additional threads remain unserviced after execution of the instruction and the pre-scheduled replay operations, then additional replay operations are inserted via the replay loop, until all threads are serviced. One advantage of the disclosed technique is that divergent operations requiring one or more replay operations execute with reduced latency.

    摘要翻译: 本公开的一个实施例阐述了在并行处理子系统中执行用于发散操作的预先安排的重播操作的优化方式。 具体地,流式多处理器(SM)包括多级流水线,其被配置为将预先安排的重播操作插入到多级流水线中。 预先安排的重播单元检测与当前指令相关联的操作是否正在访问公共资源。 如果线程正在访问分布在多个高速缓存线上的数据,则预先安排的重播单元在当前指令后面插入预先安排的重放操作。 多级流水线顺序执行指令和相关的预先安排的重播操作。 如果附加线程在执行指令和预先安排的重放操作之后保持未被接受,则通过重放循环插入附加的重放操作,直到对所有线程进行服务。 所公开技术的一个优点是需要一个或多个重放操作的发散操作以较低的等待时间执行。

    Sharing data crossbar for reads and writes in a data cache
    8.
    发明授权
    Sharing data crossbar for reads and writes in a data cache 有权
    在数据高速缓存中共享用于读写数据的交叉开关

    公开(公告)号:US09286256B2

    公开(公告)日:2016-03-15

    申请号:US12892862

    申请日:2010-09-28

    CPC分类号: G06F13/4022 G06F13/4031

    摘要: The invention sets forth an L1 cache architecture that includes a crossbar unit configured to transmit data associated with both read data requests and write data requests. Data associated with read data requests is retrieved from a cache memory and transmitted to the client subsystems. Similarly, data associated with write data requests is transmitted from the client subsystems to the cache memory. To allow for the transmission of both read and write data on the crossbar unit, an arbiter is configured to schedule the crossbar unit transmissions as well and arbitrate between data requests received from the client subsystems.

    摘要翻译: 本发明提出了一种L1缓存架构,其包括被配置为发送与读取数据请求和写入数据请求相关联的数据的交叉单元。 与读取数据请求相关联的数据从高速缓冲存储器检索并发送到客户机子系统。 类似地,与写数据请求相关联的数据从客户端子系统发送到高速缓冲存储器。 为了允许在交叉开关单元上传输读取和写入数据,仲裁器被配置为调度交叉单元传输以及在从客户端子系统接收的数据请求之间进行仲裁。

    BATCHED REPLAYS OF DIVERGENT OPERATIONS
    10.
    发明申请
    BATCHED REPLAYS OF DIVERGENT OPERATIONS 有权
    批量操作的重复操作

    公开(公告)号:US20130159684A1

    公开(公告)日:2013-06-20

    申请号:US13329066

    申请日:2011-12-16

    IPC分类号: G06F9/38 G06F9/312

    CPC分类号: G06F9/3851 G06F9/3861

    摘要: One embodiment of the present invention sets forth an optimized way to execute replay operations for divergent operations in a parallel processing subsystem. Specifically, the streaming multiprocessor (SM) includes a multistage pipeline configured to batch two or more replay operations for processing via replay loop. A logic element within the multistage pipeline detects whether the current pipeline stage is accessing a shared resource, such as loading data from a shared memory. If the threads are accessing data which are distributed across multiple cache lines, then the multistage pipeline batches two or more replay operations, where the replay operations are inserted into the pipeline back-to-back. Advantageously, divergent operations requiring two or more replay operations operate with reduced latency. Where memory access operations require transfer of more than two cache lines to service all threads, the number of clock cycles required to complete all replay operations is reduced.

    摘要翻译: 本发明的一个实施例阐述了在并行处理子系统中对发散操作执行重放操作的优化方法。 具体地说,流式多处理器(SM)包括多级流水线,其被配置为批量两个或更多个重播操作以便经由重放循环进行处理。 多级流水线内的逻辑元件检测当前流水线阶段是否正在访问共享资源,例如从共享内存加载数据。 如果线程正在访问分布在多个高速缓存行中的数据,则多级管道批量执行两个或更多个重放操作,其中重放操作被背对背地插入到管道中。 有利地,需要两次或更多次重放操作的发散操作以降低的等待时间运行。 在存储器访问操作需要传送两条以上的高速缓存行以服务所有线程的情况下,完成所有重放操作所需的时钟周期数减少。