专利检索 ap:("Intel Corporation") AND inv:"Chunhui Mei" 第 1 页

1.

发明公开
NAMED AND CLUSTER BARRIERS 审中-公开

公开(公告)号：US20240134719A1

公开(公告)日：2024-04-25

申请号：US17973234

申请日：2022-10-24

申请人： Intel Corporation

发明人： Fangwen Fu , Chunhui Mei , John A. Wiegert , Yongsheng Liu , Ben J. Ashbaugh

IPC分类号： G06F9/52 , G06F9/48

CPC分类号： G06F9/522 , G06F9/4881

摘要： Embodiments described herein provide a technique to facilitate the synchronization of workgroups executed on multiple graphics cores of a graphics core cluster. One embodiment provides a graphics core including a cache memory and a graphics core coupled with the cache memory. The graphics core includes execution resources to execute an instruction via a plurality of hardware threads and barrier circuitry to synchronize execution of the plurality of hardware threads, wherein the barrier circuitry is configured to provide a plurality of re-usable named barriers.

2.

发明申请
RANDOM SPARSITY HANDLING IN A SYSTOLIC ARRAY 有权

公开(公告)号：US20220309124A1

公开(公告)日：2022-09-29

申请号：US17211627

申请日：2021-03-24

申请人： Intel Corporation

发明人： Chunhui Mei , Hong Jiang , Jiasheng Chen , Yongsheng Liu , Yan Li

IPC分类号： G06F17/16 , G06F17/11 , G06F15/80 , G06F7/544 , G06F9/30

摘要： Matrix multiply units can take advantage of input sparsity by zero gating ALUs, which saves power consumption, but compute throughput does not increase. To improve compute throughput from sparsity, processing resources in a matrix accelerator can skip computation with zero involved in input or output. If zeros in input can be skipped, the processing units can focus calculations on generating meaningful non-zero output.

3.

发明公开
LOCALLY BIASED CACHE REPLACEMENT FOR CLUSTERED CACHE ARCHITECTURE 审中-公开

公开(公告)号：US20240220420A1

公开(公告)日：2024-07-04

申请号：US18148994

申请日：2022-12-30

申请人： Intel Corporation

发明人： Chunhui Mei , Doddaballapur Jayasimha , Aravindh V. Anantaraman , Yongsheng Liu , Hong Jiang

IPC分类号： G06F12/121 , G06F12/0895

CPC分类号： G06F12/121 , G06F12/0895

摘要： Locally biased cache replacement for a clustered cache architecture is described. An example of an apparatus includes clusters of cores; a clustered cache including multiple cache partitions for the clusters of cores, each cache partition including multiple cachelines; and a computer memory including memory partitions, each of the cache partitions being associated with a respective local memory partition, wherein each cacheline of the cache partitions includes a cacheline tag, each cacheline tag including a local tag to indicate whether data stored in the cacheline is local data stored in the local memory partition or remote data stored in a remote memory partition, and a used tag to indicate whether data stored in the cacheline is recently accessed; and wherein the clustered cache includes circuitry to select cachelines for cache replacement in a cache partition based on values of the tags of the cachelines.

4.

发明公开
DATA MULTICAST IN COMPUTE CORE CLUSTERS 审中-公开

公开(公告)号：US20240220254A1

公开(公告)日：2024-07-04

申请号：US18148997

申请日：2022-12-30

申请人： Intel Corporation

发明人： Chunhui Mei , Yongsheng Liu , John A. Wiegert , Vasanth Ranganathan , Ben J. Ashbaugh , Fangwen Fu , Hong Jiang , Guei-Yuan Lueh , James Valerio , Alan M. Curtis , Maxim Kazakov

IPC分类号： G06F9/30 , G06F9/38 , G06F9/50 , G06F9/54

CPC分类号： G06F9/30087 , G06F9/3877 , G06F9/5072 , G06F9/544

摘要： Data multicast in compute core clusters is described. An example of an apparatus includes one or more processors including at least a first processor, the first processor including one or more clusters of cores and a memory, wherein each cluster of cores includes multiple cores, each core including one or more processing resources, shared memory, and broadcast circuitry; and wherein a first core in a first cluster of cores is to request a data element, determine whether any additional cores in the first cluster require the data element, and, upon determining that one or more additional cores in the first cluster require the data element, broadcast the data element to the one or more additional cores via interconnects between the broadcast circuitry of the cores of the first core cluster.

5.

发明授权
Utilizing structured sparsity in systolic arrays 有权

公开(公告)号：US11977885B2

公开(公告)日：2024-05-07

申请号：US17107823

申请日：2020-11-30

申请人： Intel Corporation

发明人： Subramaniam Maiyuran , Jorge Parra , Ashutosh Garg , Chandra Gurram , Chunhui Mei , Durgesh Borkar , Shubra Marwaha , Supratim Pal , Varghese George , Wei Xiong , Yan Li , Yongsheng Liu , Dipankar Das , Sasikanth Avancha , Dharma Teja Vooturi , Naveen K. Mellempudi

IPC分类号： G06F9/30 , G06F9/38 , G06F15/80

CPC分类号： G06F9/30036 , G06F9/3001 , G06F9/30101 , G06F9/3893 , G06F15/8046

摘要： An apparatus to facilitate utilizing structured sparsity in systolic arrays is disclosed. The apparatus includes a processor comprising a systolic array to receive data from a plurality of source registers, the data comprising unpacked source data, structured source data that is packed based on sparsity, and metadata corresponding to the structured source data; identify portions of the unpacked source data to multiply with the structured source data, the portions of the unpacked source data identified based on the metadata; and output, to a destination register, a result of multiplication of the portions of the unpacked source data and the structured source data.

6.

发明公开
PREFETCH AWARE LRU CACHE REPLACEMENT POLICY 审中-公开

公开(公告)号：US20240104025A1

公开(公告)日：2024-03-28

申请号：US17951914

申请日：2022-09-23

申请人： Intel Corporation

发明人： Biju George , Zamshed I. Chowdhury , Prathamesh Raghunath Shinde , Chunhui Mei , Fangwen Fu

IPC分类号： G06F12/123 , G06F12/0862

CPC分类号： G06F12/123 , G06F12/0862 , G06F2212/1021

摘要： Prefetch aware LRU cache replacement policy is described. An example of an apparatus includes one or more processors including a graphic processor, the graphics processor including a load store cache having multiple cache lines (CLs), each including bits for a cache line level (CL level) and one or more sectors for data storage; wherein the graphics processor is to receive one or more data elements for storage in the cache; set a CL level to track each CL receiving data, including setting CL level 1 for a CL receiving data in response to a miss in the cache and setting a CL level 2 for a CL receiving prefetched data in response to a prefetch request, and, upon determining that space is required in the cache to store data, apply a cache replacement policy, the policy being based at least in part on set CL levels for the CLs.

7.

发明公开
FORWARD PROGRESS GUARANTEE USING SINGLE-LEVEL SYNCHRONIZATION AT INDIVIDUAL THREAD GRANULARITY 审中-公开

公开(公告)号：US20230153176A1

公开(公告)日：2023-05-18

申请号：US17528386

申请日：2021-11-17

申请人： Intel Corporation

发明人： Chunhui Mei , James Valerio , Supratim Pal , Guei-Yuan Lueh , Hong Jiang

IPC分类号： G06F9/52 , G06F9/48

CPC分类号： G06F9/522 , G06F9/48

摘要： An apparatus to facilitate facilitating forward progress guarantee using single-level synchronization at individual thread granularity is disclosed. The apparatus includes a processor comprising a barrier synchronization hardware circuitry to assign a set of global named barrier identifiers (IDs) to individual execution threads of a plurality of execution threads and synchronize execution of the individual execution threads on a single level via the set of global named barrier IDs; and a plurality of processing resources to execute the plurality of execution threads and comprising divergent barrier scheduling hardware circuitry to facilitate execution flow switching from a first divergent branch executed by a first thread to a second divergent branch executed by a second thread, the execution flow switching performed responsive to the first thread stalling to wait on a named barrier of the set of global named barrier IDs.

8.

发明申请
DUAL PIPELINE PARALLEL SYSTOLIC ARRAY 有权

公开(公告)号：US20220414054A1

公开(公告)日：2022-12-29

申请号：US17304797

申请日：2021-06-25

申请人： Intel Corporation

发明人： Jorge Parra , Jiasheng Chen , Supratim Pal , Fangwen Fu , Sabareesh Ganapathy , Chandra Gurram , Chunhui Mei , Yue Qi

IPC分类号： G06F15/80 , G06F9/38

摘要： A processing apparatus described herein includes a general-purpose parallel processing engine comprising a systolic array having multiple pipelines, each of the multiple pipelines including multiple pipeline stages, wherein the multiple pipelines include a first pipeline, a second pipeline, and a common input shared between the first pipeline and the second pipeline.

9.

发明公开
DETERMINISTIC BROADCASTING FROM SHARED MEMORY 审中-公开

公开(公告)号：US20240111534A1

公开(公告)日：2024-04-04

申请号：US17957486

申请日：2022-09-30

申请人： Intel Corporation

发明人： Fangwen Fu , Chunhui Mei , Maxim Kazakov , Biju George , Jorge Parra , Supratim Pal

IPC分类号： G06F9/30 , G06F9/54

CPC分类号： G06F9/30047 , G06F9/3009 , G06F9/542

摘要： Embodiments described herein provide a technique enable a broadcast load from an L1 cache or shared local memory to register files associated with hardware threads of a graphics core. One embodiment provides a graphics processor comprising a cache memory and a graphics core coupled with the cache memory. The graphics core includes a plurality of hardware threads and memory access circuitry to facilitate access to memory by the plurality of hardware threads. The graphics core is configurable to process a plurality of load request from the plurality of hardware threads, detect duplicate load requests within the plurality of load requests, perform a single read from the cache memory in response to the duplicate load requests, and transmit data associated with the duplicate load requests to requesting hardware threads.

10.

发明授权
Conversion hardware mechanism 有权

公开(公告)号：US11494163B2

公开(公告)日：2022-11-08

申请号：US16562979

申请日：2019-09-06

申请人： Intel Corporation

发明人： Naveen Mellempudi , Dipankar Das , Chunhui Mei , Kristopher Wong , Dhiraj D. Kalamkar , Hong H. Jiang , Subramaniam Maiyuran , Varghese George

IPC分类号： G06F7/499 , G06F17/16 , G06T1/20 , G06N3/04 , G06N3/08

摘要： An apparatus to facilitate a computer number format conversion is disclosed. The apparatus comprises a control unit to receive to receive data format information indicating a first precision data format that input data is to be received and converter hardware to receive the input data and convert the first precision data format to a second precision data format based on the data format information.

搜索结果

国家/区域

专利有效性

申请日

公布(公告)日

申请人

申请人所在国/区域

发明人

IPC

IPC部

IPC大类

IPC小类

IPC大组

IPC小组

外观分类