Patent search ap:("Intel Corporation") AND inv:"James Valerio" Page 1

1.

发明公开
SYNCHRONIZATION FOR DATA MULTICAST IN COMPUTE CORE CLUSTERS 审中-公开

公开(公告)号：US20240220335A1

公开(公告)日：2024-07-04

申请号：US18148993

申请日：2022-12-30

Applicant: Intel Corporation

Inventor： Chunhui Mei , Yongsheng Liu , John A. Wiegert , Vasanth Ranganathan , Ben J. Ashbaugh , Fangwen Fu , Hong Jiang , Guei-Yuan Lueh , James Valerio , Alan M. Curtis , Maxim Kazakov

IPC: G06F9/52 , G06F9/38 , G06F9/50

CPC classification number: G06F9/522 , G06F9/3877 , G06F9/5072 , G06F9/3887

Abstract: Synchronization for data multicast in compute core clusters is described. An example of an apparatus includes one or more processors including at least a graphics processing unit (GPU), the GPU including one or more clusters of cores and a memory, wherein each cluster of cores includes a plurality of cores, each core including one or more processing resources, shared local memory, and gateway circuitry, wherein the GPU is to initiate broadcast of a data element from a producer core to one or more consumer cores, and synchronize the broadcast of the data element utilizing the gateway circuitry of the producer core and the one or more consumer cores, and wherein synchronizing the broadcast of the data element includes establishing a multi-core barrier for broadcast of the data element.

2.

发明授权
Hierarchical thread scheduling based on multiple barriers 有权

公开(公告)号：US11977895B2

公开(公告)日：2024-05-07

申请号：US17131647

申请日：2020-12-22

Applicant: Intel Corporation

Inventor： Sabareesh Ganapathy , Fangwen Fu , Hong Jiang , James Valerio

IPC: G06F9/38 , G06F9/48 , G06F9/54 , G06T1/20

CPC classification number: G06F9/3838 , G06F9/4881 , G06F9/544 , G06T1/20

Abstract: Examples described herein relate to a graphics processing unit (GPU) coupled to the memory device, the GPU configured to: execute an instruction thread; determine if a dual directional signal barrier is associated with the instruction thread; and based on clearance of the dual directional signal barrier for a particular signal barrier identifier and a mode of operation, indicate a clearance of the dual directional signal barrier for the mode of operation, wherein the dual directional signal barrier is to provide a single barrier to gate activity of one or more producers based on activity of one or more consumers or gate activity of one or more consumers based on activity of one or more producers.

3.

发明公开
SYNCHRONIZATION UTILIZING LOCAL TEAM BARRIERS FOR THREAD TEAM PROCESSING 审中-公开

公开(公告)号：US20240111609A1

公开(公告)日：2024-04-04

申请号：US17958213

申请日：2022-09-30

Applicant: Intel Corporation

Inventor： Biju George , Supratim Pal , James Valerio , Vasanth Ranganathan , Fangwen Fu , Chunhui Mei

IPC: G06F9/52 , G06F9/30

CPC classification number: G06F9/522 , G06F9/30098

Abstract: Low-latency synchronization utilizing local team barriers for thread team processing is described. An example of an apparatus includes one or more processors including a graphics processor, the graphics processor including a plurality of processing resources; and memory for storage of data including data for graphics processing, wherein the graphics processor is to receive a request for establishment of a local team barrier for a thread team, the thread team being allocated to a first processing resource, the thread team including multiple threads; determine requirements and designated threads for the local team barrier; and establish the local team barrier in a local register of the first processing resource based at least in part on the requirements and designated threads for the local barrier.

4.

发明公开
SYSTEMS AND METHODS FOR UPDATING MEMORY SIDE CACHES IN A MULTI-GPU CONFIGURATION 审中-公开

公开(公告)号：US20240086357A1

公开(公告)日：2024-03-14

申请号：US18516716

申请日：2023-11-21

Applicant: Intel Corporation

Inventor： Altug Koker , Joydeep Ray , Aravindh Anantaraman , Valentin Andrei , Abhishek Appu , Sean Coleman , Nicolas Galoppo Von Borries , Varghese George , Pattabhiraman K , SungYe Kim , Mike Macpherson , Subramaniam Maiyuran , Elmoustapha Ould-Ahmed-Vall , Vasanth Ranganathan , James Valerio

IPC: G06F15/78 , G06F7/544 , G06F7/575 , G06F7/58 , G06F9/30 , G06F9/38 , G06F9/50 , G06F12/02 , G06F12/06 , G06F12/0802 , G06F12/0804 , G06F12/0811 , G06F12/0862 , G06F12/0866 , G06F12/0871 , G06F12/0875 , G06F12/0882 , G06F12/0888 , G06F12/0891 , G06F12/0893 , G06F12/0895 , G06F12/0897 , G06F12/1009 , G06F12/128 , G06F15/80 , G06F17/16 , G06F17/18 , G06T1/20 , G06T1/60 , H03M7/46

CPC classification number: G06F15/7839 , G06F7/5443 , G06F7/575 , G06F7/588 , G06F9/3001 , G06F9/30014 , G06F9/30036 , G06F9/3004 , G06F9/30043 , G06F9/30047 , G06F9/30065 , G06F9/30079 , G06F9/3887 , G06F9/5011 , G06F9/5077 , G06F12/0215 , G06F12/0238 , G06F12/0246 , G06F12/0607 , G06F12/0802 , G06F12/0804 , G06F12/0811 , G06F12/0862 , G06F12/0866 , G06F12/0871 , G06F12/0875 , G06F12/0882 , G06F12/0888 , G06F12/0891 , G06F12/0893 , G06F12/0895 , G06F12/0897 , G06F12/1009 , G06F12/128 , G06F15/8046 , G06F17/16 , G06F17/18 , G06T1/20 , G06T1/60 , H03M7/46 , G06T15/06

Abstract: Systems and methods for updating remote memory side caches in a multi-GPU configuration are disclosed herein. In one embodiment, a graphics processor for a multi-tile architecture includes a first graphics processing unit (GPU) having a first memory, a first memory side cache memory, a first communication fabric, and a first memory management unit (MMU). The graphics processor includes a second graphics processing unit (GPU) having a second memory, a second memory side cache memory, a second memory management unit (MMU), and a second communication fabric that is communicatively coupled to the first communication fabric. The first MMU is configured to control memory requests for the first memory, to update content in the first memory, to update content in the first memory side cache memory, and to determine whether to update the content in the second memory side cache memory.

5.

发明申请
WORKLOAD SCHEDULING AND DISTRIBUTION ON A DISTRIBUTED GRAPHICS DEVICE 有权

公开(公告)号：US20230039853A1

公开(公告)日：2023-02-09

申请号：US17968469

申请日：2022-10-18

Applicant: Intel Corporation

Inventor： Balaji Vembu , Brandon Fliflet , James Valerio , Michael Apodaca , Ben Ashbaugh , Hema Nalluri , Ankur Shah , Murali Ramadoss , David Puffer , Altug Koker , Aditya Navale , Abhishek R. Appu , Joydeep Ray , Travis Schluessler

IPC: G06T1/20 , G06F9/48 , G06F9/50 , G06F9/52 , G06T1/60

Abstract: Embodiments described herein provide a graphics, media, and compute device having a tiled architecture composed of a number of tiles of smaller graphics devices. The work distribution infrastructure for such device enables the distribution of workloads across multiple tiles of the device. Work items can be submitted to any one or more of the multiple tiles, with workloads able to span multiple tiles. Additionally, upon completion of a work item, graphics, media, and/or compute engines within the device can readily acquire new work items for execution with minimal latency.

6.

发明授权
Workload scheduling and distribution on a distributed graphics device 有权

公开(公告)号：US11481864B2

公开(公告)日：2022-10-25

申请号：US17234039

申请日：2021-04-19

Applicant: Intel Corporation

Inventor： Balaji Vembu , Brandon Fliflet , James Valerio , Michael Apodaca , Ben Ashbaugh , Hema Nalluri , Ankur Shah , Murali Ramadoss , David Puffer , Altug Koker , Aditya Navale , Abhishek R. Appu , Joydeep Ray , Travis Schluessler

IPC: G06T1/20 , G06F9/48 , G06F9/50 , G06F9/52 , G06T1/60

Abstract: Embodiments described herein provide a graphics, media, and compute device having a tiled architecture composed of a number of tiles of smaller graphics devices. The work distribution infrastructure for such device enables the distribution of workloads across multiple tiles of the device. Work items can be submitted to any one or more of the multiple tiles, with workloads able to span multiple tiles. Additionally, upon completion of a work item, graphics, media, and/or compute engines within the device can readily acquire new work items for execution with minimal latency.

7.

发明授权
Multiple independent synchonization named barrier within a thread group 有权

公开(公告)号：US11409579B2

公开(公告)日：2022-08-09

申请号：US16798603

申请日：2020-02-24

Applicant: Intel Corporation

Inventor： James Valerio , Vasanth Ranganathan , Joydeep Ray

IPC: G06F9/52 , G06F9/54 , G06F9/30 , G06F9/38 , G06F9/48 , G06T1/20 , G06F15/78 , G06N20/00

Abstract: An apparatus to facilitate thread barrier synchronization is disclosed. The apparatus includes a plurality of processing resources to execute a plurality of execution threads included in a thread workgroup and barrier synchronization hardware to assign a first named barrier to a first set of the plurality of execution threads in the thread workgroup, assign a second named barrier to a second set of the plurality of execution threads in the thread workgroup, synchronize execution of the first set of execution threads via the first named barrier and synchronize execution of the second set of execution threads via the second named barrier.

8.

发明申请
PARTIAL WRITE MANAGEMENT IN A MULTI-TILED COMPUTE ENGINE 有权

公开(公告)号：US20210056028A1

公开(公告)日：2021-02-25

申请号：US17068754

申请日：2020-10-12

Applicant: Intel Corporation

Inventor： JOYDEEP RAY , James Valerio , Ben Ashbaugh , Lakshminarayanan Striramassarma

IPC: G06F12/0811 , G06F3/06 , G06F9/38 , G06F9/54

Abstract: Embodiments described herein provide a general purpose graphics processor comprising a plurality of tiles, each tile of the plurality of tiles comprising at least one execution unit, a local cache, and a cache control unit, and a high bandwidth memory communicatively coupled to the plurality of tiles, wherein the high bandwidth memory is shared between the plurality of tiles. The cache control unit is to implement a partial write management protocol to receive a partial write operation directed to a cache line in the local cache, the partial write operation comprising write data, write the data associated with the partial write operation to the local cache when the cache line is in a modified state, and forward the write data associated with the partial write operation to the high bandwidth memory when the partial write operation triggers a cache miss or when the cache line is in an exclusive state or a shared state. Other embodiments may be described and claimed.

9.

发明授权
Microcontroller-based flexible thread scheduling launching in computing environments 有权

公开(公告)号：US10402224B2

公开(公告)日：2019-09-03

申请号：US15860708

申请日：2018-01-03

Applicant: Intel Corporation

Inventor： Kiran C. Veernapu , Kamlesh Pillai , James Valerio , Joydeep Ray , Abhishek Appu

IPC: G06F9/48 , G06F9/54 , G06T1/20 , G06F9/22 , G06T15/00

Abstract: A mechanism is described to facilitate microcontroller-based flexible thread scheduling launching in computing environments. An apparatus of embodiments, as described herein, includes facilitating a graphics processor hosting a microcontroller having a thread scheduling unit, and detection and observation logic to detect a scheduling algorithm associated with an application at the apparatus. The apparatus may further include reading and dispatching logic to facilitate the microcontroller to prepare a flexible dispatch routine based on the scheduling algorithm. The apparatus may further include scheduling and launching logic to facilitate the thread scheduling unit to dynamically schedule and launch threads based on the flexible dispatch routine, where the threads are hosted by the graphics processor.

10.

发明授权
Instruction prefetch based on thread dispatch commands 有权

公开(公告)号：US12124852B2

公开(公告)日：2024-10-22

申请号：US18347964

申请日：2023-07-06

Applicant: Intel Corporation

Inventor： James Valerio , Vasanth Ranganathan , Joydeep Ray , Pradeep Ramani

IPC: G06F9/44 , G06F9/38 , G06F13/28 , G06T1/20

CPC classification number: G06F9/3802 , G06F13/28 , G06T1/20

Abstract: A graphics processing device is provided that includes a set of compute units to execute a workload, a cache coupled with the set of compute units, and circuitry coupled with the cache and the set of compute units. The circuitry is configured to, in response to a cache miss for the read from a first cache, broadcast an event within the graphics processor device to identify data associated with the cache miss, receive the event at a second compute unit in the set of compute units, and prefetch the data identified by the event into a second cache that is local to the second compute unit before an attempt to read the instruction or data by the second thread.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification