NAMED AND CLUSTER BARRIERS
    1.
    发明公开

    公开(公告)号:US20240134719A1

    公开(公告)日:2024-04-25

    申请号:US17973234

    申请日:2022-10-24

    申请人: Intel Corporation

    IPC分类号: G06F9/52 G06F9/48

    CPC分类号: G06F9/522 G06F9/4881

    摘要: Embodiments described herein provide a technique to facilitate the synchronization of workgroups executed on multiple graphics cores of a graphics core cluster. One embodiment provides a graphics core including a cache memory and a graphics core coupled with the cache memory. The graphics core includes execution resources to execute an instruction via a plurality of hardware threads and barrier circuitry to synchronize execution of the plurality of hardware threads, wherein the barrier circuitry is configured to provide a plurality of re-usable named barriers.

    LOCALLY BIASED CACHE REPLACEMENT FOR CLUSTERED CACHE ARCHITECTURE

    公开(公告)号:US20240220420A1

    公开(公告)日:2024-07-04

    申请号:US18148994

    申请日:2022-12-30

    申请人: Intel Corporation

    IPC分类号: G06F12/121 G06F12/0895

    CPC分类号: G06F12/121 G06F12/0895

    摘要: Locally biased cache replacement for a clustered cache architecture is described. An example of an apparatus includes clusters of cores; a clustered cache including multiple cache partitions for the clusters of cores, each cache partition including multiple cachelines; and a computer memory including memory partitions, each of the cache partitions being associated with a respective local memory partition, wherein each cacheline of the cache partitions includes a cacheline tag, each cacheline tag including a local tag to indicate whether data stored in the cacheline is local data stored in the local memory partition or remote data stored in a remote memory partition, and a used tag to indicate whether data stored in the cacheline is recently accessed; and wherein the clustered cache includes circuitry to select cachelines for cache replacement in a cache partition based on values of the tags of the cachelines.

    PREFETCH AWARE LRU CACHE REPLACEMENT POLICY
    6.
    发明公开

    公开(公告)号:US20240104025A1

    公开(公告)日:2024-03-28

    申请号:US17951914

    申请日:2022-09-23

    申请人: Intel Corporation

    IPC分类号: G06F12/123 G06F12/0862

    摘要: Prefetch aware LRU cache replacement policy is described. An example of an apparatus includes one or more processors including a graphic processor, the graphics processor including a load store cache having multiple cache lines (CLs), each including bits for a cache line level (CL level) and one or more sectors for data storage; wherein the graphics processor is to receive one or more data elements for storage in the cache; set a CL level to track each CL receiving data, including setting CL level 1 for a CL receiving data in response to a miss in the cache and setting a CL level 2 for a CL receiving prefetched data in response to a prefetch request, and, upon determining that space is required in the cache to store data, apply a cache replacement policy, the policy being based at least in part on set CL levels for the CLs.

    FORWARD PROGRESS GUARANTEE USING SINGLE-LEVEL SYNCHRONIZATION AT INDIVIDUAL THREAD GRANULARITY

    公开(公告)号:US20230153176A1

    公开(公告)日:2023-05-18

    申请号:US17528386

    申请日:2021-11-17

    申请人: Intel Corporation

    IPC分类号: G06F9/52 G06F9/48

    CPC分类号: G06F9/522 G06F9/48

    摘要: An apparatus to facilitate facilitating forward progress guarantee using single-level synchronization at individual thread granularity is disclosed. The apparatus includes a processor comprising a barrier synchronization hardware circuitry to assign a set of global named barrier identifiers (IDs) to individual execution threads of a plurality of execution threads and synchronize execution of the individual execution threads on a single level via the set of global named barrier IDs; and a plurality of processing resources to execute the plurality of execution threads and comprising divergent barrier scheduling hardware circuitry to facilitate execution flow switching from a first divergent branch executed by a first thread to a second divergent branch executed by a second thread, the execution flow switching performed responsive to the first thread stalling to wait on a named barrier of the set of global named barrier IDs.

    DETERMINISTIC BROADCASTING FROM SHARED MEMORY

    公开(公告)号:US20240111534A1

    公开(公告)日:2024-04-04

    申请号:US17957486

    申请日:2022-09-30

    申请人: Intel Corporation

    IPC分类号: G06F9/30 G06F9/54

    摘要: Embodiments described herein provide a technique enable a broadcast load from an L1 cache or shared local memory to register files associated with hardware threads of a graphics core. One embodiment provides a graphics processor comprising a cache memory and a graphics core coupled with the cache memory. The graphics core includes a plurality of hardware threads and memory access circuitry to facilitate access to memory by the plurality of hardware threads. The graphics core is configurable to process a plurality of load request from the plurality of hardware threads, detect duplicate load requests within the plurality of load requests, perform a single read from the cache memory in response to the duplicate load requests, and transmit data associated with the duplicate load requests to requesting hardware threads.