SHARED LOCAL REGISTERS FOR THREAD TEAM PROCESSING

    公开(公告)号:US20240112295A1

    公开(公告)日:2024-04-04

    申请号:US17958216

    申请日:2022-09-30

    申请人: Intel Corporation

    IPC分类号: G06T1/20 G06F9/30 G06F9/38

    摘要: Shared local registers for thread team processing is described. An example of an apparatus includes one or more processors including a graphic processor having multiple processing resources; and memory for storage of data, the graphics processor to allocate a first thread team to a first processing resource, the first thread team including hardware threads to be executed solely by the first processing resource; allocate a shared local register (SLR) space that may be directly reference in the ISA instructions to the first processing resource, the SLR space being accessible to the threads of the thread team and being inaccessible to threads outside of the thread team; and allocate individual register spaces to the thread team, each of the individual register spaces being accessible to a respective thread of the thread team.

    CROSS-THREAD REGISTER SHARING FOR MATRIX MULTIPLICATION COMPUTE

    公开(公告)号:US20240168807A1

    公开(公告)日:2024-05-23

    申请号:US18056949

    申请日:2022-11-18

    申请人: Intel Corporation

    摘要: An apparatus to facilitate cross-thread register sharing for matrix multiplication compute is disclosed. The apparatus includes matrix acceleration hardware comprising a plurality of data processing units, wherein the respective plurality of data processing units are to: receive a decoded instruction for a first thread having a first register space, wherein the decoded instruction is for a matrix multiplication operation and comprises an indication to utilize a second register space of a second thread for an operand of the decoded instruction for the first thread; access the second register space of the second thread to obtain data for the operand of the decoded instruction; and perform the matrix multiplication operation for the first thread using the data for the operand from the second register space of the second thread.

    DETERMINISTIC BROADCASTING FROM SHARED MEMORY

    公开(公告)号:US20240111534A1

    公开(公告)日:2024-04-04

    申请号:US17957486

    申请日:2022-09-30

    申请人: Intel Corporation

    IPC分类号: G06F9/30 G06F9/54

    摘要: Embodiments described herein provide a technique enable a broadcast load from an L1 cache or shared local memory to register files associated with hardware threads of a graphics core. One embodiment provides a graphics processor comprising a cache memory and a graphics core coupled with the cache memory. The graphics core includes a plurality of hardware threads and memory access circuitry to facilitate access to memory by the plurality of hardware threads. The graphics core is configurable to process a plurality of load request from the plurality of hardware threads, detect duplicate load requests within the plurality of load requests, perform a single read from the cache memory in response to the duplicate load requests, and transmit data associated with the duplicate load requests to requesting hardware threads.