专利检索 ap:("Intel Corporation") AND inv:"Maxim Kazakov" 第 1 页

1.

发明公开
SCALABLE AND CONFIGURABLE CLUSTERED SYSTOLIC ARRAY 审中-公开

公开(公告)号：US20240220448A1

公开(公告)日：2024-07-04

申请号：US18148998

申请日：2022-12-30

申请人： Intel Corporation

发明人： Chunhui Mei , Jiasheng Chen , Ben J. Ashbaugh , Fangwen Fu , Hong Jiang , Guei-Yuan Lueh , Rama S.B. Harihara , Maxim Kazakov

IPC分类号： G06F15/80 , G06F13/16

CPC分类号： G06F15/8046 , G06F13/1668

摘要： A scalable and configurable clustered systolic array is described. An example of apparatus includes a cluster including multiple cores; and a cache memory coupled with the cluster, wherein each core includes multiple processing resources, a memory coupled with the plurality of processing resources, a systolic array coupled with the memory, and one or more interconnects with one or more other cores of the plurality of cores; and wherein the systolic arrays of the cores are configurable by the apparatus to form a logically combined systolic array for processing of an operation by a cooperative group of threads running on one or more of the plurality of cores in the cluster.

2.

发明公开
SYNCHRONIZATION FOR DATA MULTICAST IN COMPUTE CORE CLUSTERS 审中-公开

公开(公告)号：US20240220335A1

公开(公告)日：2024-07-04

申请号：US18148993

申请日：2022-12-30

申请人： Intel Corporation

发明人： Chunhui Mei , Yongsheng Liu , John A. Wiegert , Vasanth Ranganathan , Ben J. Ashbaugh , Fangwen Fu , Hong Jiang , Guei-Yuan Lueh , James Valerio , Alan M. Curtis , Maxim Kazakov

IPC分类号： G06F9/52 , G06F9/38 , G06F9/50

CPC分类号： G06F9/522 , G06F9/3877 , G06F9/5072 , G06F9/3887

摘要： Synchronization for data multicast in compute core clusters is described. An example of an apparatus includes one or more processors including at least a graphics processing unit (GPU), the GPU including one or more clusters of cores and a memory, wherein each cluster of cores includes a plurality of cores, each core including one or more processing resources, shared local memory, and gateway circuitry, wherein the GPU is to initiate broadcast of a data element from a producer core to one or more consumer cores, and synchronize the broadcast of the data element utilizing the gateway circuitry of the producer core and the one or more consumer cores, and wherein synchronizing the broadcast of the data element includes establishing a multi-core barrier for broadcast of the data element.

3.

发明公开
SHARED LOCAL REGISTERS FOR THREAD TEAM PROCESSING 审中-公开

公开(公告)号：US20240112295A1

公开(公告)日：2024-04-04

申请号：US17958216

申请日：2022-09-30

申请人： Intel Corporation

发明人： Biju George , Fangwen Fu , Supratim Pal , Jorge Parra , Chunhui Mei , Maxim Kazakov , Joydeep Ray

IPC分类号： G06T1/20 , G06F9/30 , G06F9/38

CPC分类号： G06T1/20 , G06F9/30098 , G06F9/3836

摘要： Shared local registers for thread team processing is described. An example of an apparatus includes one or more processors including a graphic processor having multiple processing resources; and memory for storage of data, the graphics processor to allocate a first thread team to a first processing resource, the first thread team including hardware threads to be executed solely by the first processing resource; allocate a shared local register (SLR) space that may be directly reference in the ISA instructions to the first processing resource, the SLR space being accessible to the threads of the thread team and being inaccessible to threads outside of the thread team; and allocate individual register spaces to the thread team, each of the individual register spaces being accessible to a respective thread of the thread team.

4.

发明公开
CROSS-THREAD REGISTER SHARING FOR MATRIX MULTIPLICATION COMPUTE 审中-公开

公开(公告)号：US20240168807A1

公开(公告)日：2024-05-23

申请号：US18056949

申请日：2022-11-18

申请人： Intel Corporation

发明人： Jorge Eduardo Parra Osorio , Guei-Yuan Lueh , Maxim Kazakov , Fangwen Fu , Supratim Pal , Kaiyu Chen

IPC分类号： G06F9/50 , G06F9/48 , G06F9/52 , G06F15/80

CPC分类号： G06F9/5027 , G06F9/48 , G06F9/522 , G06F15/8046

摘要： An apparatus to facilitate cross-thread register sharing for matrix multiplication compute is disclosed. The apparatus includes matrix acceleration hardware comprising a plurality of data processing units, wherein the respective plurality of data processing units are to: receive a decoded instruction for a first thread having a first register space, wherein the decoded instruction is for a matrix multiplication operation and comprises an indication to utilize a second register space of a second thread for an operand of the decoded instruction for the first thread; access the second register space of the second thread to obtain data for the operand of the decoded instruction; and perform the matrix multiplication operation for the first thread using the data for the operand from the second register space of the second thread.

5.

发明公开
DETERMINISTIC BROADCASTING FROM SHARED MEMORY 审中-公开

公开(公告)号：US20240111534A1

公开(公告)日：2024-04-04

申请号：US17957486

申请日：2022-09-30

申请人： Intel Corporation

发明人： Fangwen Fu , Chunhui Mei , Maxim Kazakov , Biju George , Jorge Parra , Supratim Pal

IPC分类号： G06F9/30 , G06F9/54

CPC分类号： G06F9/30047 , G06F9/3009 , G06F9/542

摘要： Embodiments described herein provide a technique enable a broadcast load from an L1 cache or shared local memory to register files associated with hardware threads of a graphics core. One embodiment provides a graphics processor comprising a cache memory and a graphics core coupled with the cache memory. The graphics core includes a plurality of hardware threads and memory access circuitry to facilitate access to memory by the plurality of hardware threads. The graphics core is configurable to process a plurality of load request from the plurality of hardware threads, detect duplicate load requests within the plurality of load requests, perform a single read from the cache memory in response to the duplicate load requests, and transmit data associated with the duplicate load requests to requesting hardware threads.

6.

发明公开
DATA MULTICAST IN COMPUTE CORE CLUSTERS 审中-公开

公开(公告)号：US20240220254A1

公开(公告)日：2024-07-04

申请号：US18148997

申请日：2022-12-30

申请人： Intel Corporation

发明人： Chunhui Mei , Yongsheng Liu , John A. Wiegert , Vasanth Ranganathan , Ben J. Ashbaugh , Fangwen Fu , Hong Jiang , Guei-Yuan Lueh , James Valerio , Alan M. Curtis , Maxim Kazakov

IPC分类号： G06F9/30 , G06F9/38 , G06F9/50 , G06F9/54

CPC分类号： G06F9/30087 , G06F9/3877 , G06F9/5072 , G06F9/544

摘要： Data multicast in compute core clusters is described. An example of an apparatus includes one or more processors including at least a first processor, the first processor including one or more clusters of cores and a memory, wherein each cluster of cores includes multiple cores, each core including one or more processing resources, shared memory, and broadcast circuitry; and wherein a first core in a first cluster of cores is to request a data element, determine whether any additional cores in the first cluster require the data element, and, upon determining that one or more additional cores in the first cluster require the data element, broadcast the data element to the one or more additional cores via interconnects between the broadcast circuitry of the cores of the first core cluster.

搜索结果

国家/区域

专利有效性

申请日

公布(公告)日

申请人

申请人所在国/区域

发明人

IPC

IPC部

IPC大类

IPC小类

IPC大组

IPC小组

外观分类