Patent search ap:("INTEL CORPORATION") AND inv:"Balaji Vembu" Page 1

1.

发明授权
Efficient thread group scheduling 有权

公开(公告)号：US12210900B2

公开(公告)日：2025-01-28

申请号：US17746201

申请日：2022-05-17

Applicant: Intel Corporation

Inventor： Joydeep Ray , Abhishek R. Appu , Altug Koker , Kamal Sinha , Balaji Vembu , Rajkishore Barik , Eriko Nurvitadhi , Nicolas Galoppo Von Borries , Tsung-Han Lin , Sanjeev Jahagirdar , Vasanth Ranganathan

IPC: G06F9/48 , G06T1/20

Abstract: A mechanism is described for facilitating intelligent thread scheduling at autonomous machines. A method of embodiments, as described herein, includes detecting dependency information relating to a plurality of threads corresponding to a plurality of workloads associated with tasks relating to a processor including a graphics processor. The method may further include generating a tree of thread groups based on the dependency information, where each thread group includes multiple threads, and scheduling one or more of the thread groups associated a similar dependency to avoid dependency conflicts.

2.

发明公开
Regional Adjustment of Render Rate 审中-公开

公开(公告)号：US20240354043A1

公开(公告)日：2024-10-24

申请号：US18648737

申请日：2024-04-29

Applicant: Intel Corporation

Inventor： Eric J. Asperheim , Subramaniam Maiyuran , Kiran C. Veernapu , Sanjeev S. Jahagirdar , Balaji Vembu , Devan Burke , Philip R. Laws , Kamal Sinha , Abhishek R. Appu , Elmoustapha Ould-Ahmed-Vall , Peter L. Doyle , Joydeep Ray , Travis T. Schluessler , John H. Feit , Nikos Kaburlasos , Jacek Kwiatkowski , Altug Koker

IPC: G06F3/14 , G06F3/01 , G06F3/0484 , G09G5/00 , G09G5/391

CPC classification number: G06F3/1438 , G06F3/013 , G06F3/0484 , G09G5/391 , G09G5/001 , G09G2340/0435 , G09G2352/00 , G09G2354/00 , G09G2360/08 , G09G2360/121

Abstract: In accordance with some embodiments, the render rate is varied across and/or up and down the display screen. This may be done based on where the user is looking in order to reduce power consumption and/or increase performance. Specifically the screen display is separated into regions, such as quadrants. Each of these regions is rendered at a rate determined by at least one of what the user is currently looking at, what the user has looked at in the past and/or what it is predicted that the user will look at next. Areas of less focus may be rendered at a lower rate, reducing power consumption in some embodiments.

3.

发明授权
Processor power management 有权

公开(公告)号：US12124310B2

公开(公告)日：2024-10-22

申请号：US18339827

申请日：2023-06-22

Applicant: INTEL CORPORATION

Inventor： Altug Koker , Abhishek R. Appu , Kiran C. Veernapu , Joydeep Ray , Balaji Vembu , Prasoonkumar Surti , Kamal Sinha , Eric J. Hoekstra , Wenyin Fu , Nikos Kaburlasos , Bhushan M. Borole , Travis T. Schluessler , Ankur N. Shah , Jonathan Kennedy

IPC: G09G3/06 , G06F1/3203 , G06F1/3209 , G06F1/3212 , G06F1/3218 , G06F1/3231 , G06F1/324 , G06F3/01 , G06F11/07 , G06F11/30 , H04W52/02 , H04M1/72448

CPC classification number: G06F1/3209 , G06F1/3203 , G06F1/3212 , G06F1/3218 , G06F1/3231 , G06F1/324 , G06F3/01 , G06F11/0781 , G06F11/3062 , H04W52/0258 , H04M1/72448 , Y02D10/00 , Y02D30/70

Abstract: Methods and apparatus relating to techniques for avoiding cache lookup for cold cache. In an example, an apparatus comprises logic, at least partially comprising hardware logic, to collect user information for a user of a data processing device, generate a user profile for the user of the data processing device from the user information, and set a power profile a processor in the data processing device using the user profile. Other embodiments are also disclosed and claimed.

4.

发明公开
DYNAMICALLY POWER ON/OFF PROCESSING CLUSTERS DURING EXECUTION 审中-公开

公开(公告)号：US20240288927A1

公开(公告)日：2024-08-29

申请号：US18633932

申请日：2024-04-12

Applicant: Intel Corporation

Inventor： Balaji Vembu , Josh B. Mastronarde , Nikos Kaburlasos

IPC: G06F1/3287 , G06F1/26 , G06F1/3296 , G06F9/50 , G06F13/40

CPC classification number: G06F1/3287 , G06F1/3296 , G06F9/5083 , G06F9/5088 , G06F9/5094 , G06F13/4022 , G06F1/26

Abstract: In an example, an apparatus comprises logic, at least partially comprising hardware logic, to power on a first set of processing clusters, dispatch a workload to the first set of processing clusters, detect a full operating state of the first set of processing clusters, and in response to the detection of a full operating state of the first set of processing clusters, to power on a second set of processing clusters. Other embodiments are also disclosed and claimed.

5.

发明公开
COMPUTE OPTIMIZATION MECHANISM FOR DEEP NEURAL NETWORKS 审中-公开

公开(公告)号：US20240257294A1

公开(公告)日：2024-08-01

申请号：US18436494

申请日：2024-02-08

Applicant: Intel Corporation

Inventor： Prasoonkumar Surti , Narayan Srinivasa , Feng Chen , Joydeep Ray , Ben J. Ashbaugh , Nicolas C. Galoppo Von Borries , Eriko Nurvitadhi , Balaji Vembu , Tsung-Han Lin , Kamal Sinha , Rajkishore Barik , Sara S. Baghsorkhi , Justin E. Gottschlich , Altug Koker , Nadathur Rajagopalan Satish , Farshad Akhbari , Dukhwan Kim , Wenyin Fu , Travis T. Schluessler , Josh B. Mastronarde , Linda L. Hurd , John H. Feit , Jeffery S. Boles , Adam T. Lake , Karthik Vaidyanathan , Devan Burke , Subramaniam Maiyuran , Abhishek R. Appu

IPC: G06T1/20 , G06F8/41 , G06F9/455 , G06F9/50 , G06N3/044 , G06N3/045 , G06N3/063 , G06N3/084

CPC classification number: G06T1/20 , G06F9/45533 , G06F9/5061 , G06F9/5094 , G06N3/044 , G06N3/045 , G06N3/063 , G06N3/084 , G06F8/41 , G06F2009/45583

Abstract: Embodiments provide mechanisms to facilitate compute operations for deep neural networks. One embodiment comprises a graphics processing unit comprising one or more multiprocessors, at least one of the one or more multiprocessors including a register file to store a plurality of different types of operands and a plurality of processing cores. The plurality of processing cores includes a first set of processing cores of a first type and a second set of processing cores of a second type. The first set of processing cores are associated with a first memory channel and the second set of processing cores are associated with a second memory channel.

6.

发明授权
Specialized fixed function hardware for efficient convolution 有权

公开(公告)号：US12050984B2

公开(公告)日：2024-07-30

申请号：US17083080

申请日：2020-10-28

Applicant: Intel Corporation

Inventor： Rajkishore Barik , Elmoustapha Ould-Ahmed-Vall , Xiaoming Chen , Dhawal Srivastava , Anbang Yao , Kevin Nealis , Eriko Nurvitadhi , Sara S. Baghsorkhi , Balaji Vembu , Tatiana Shpeisman , Ping T. Tang

IPC: G06N3/06 , G06F9/30 , G06F9/38 , G06F16/17 , G06N3/044 , G06N3/045 , G06N3/063 , G06N3/084 , G06T1/20

CPC classification number: G06N3/063 , G06F9/3001 , G06F9/3017 , G06F9/3851 , G06F9/3887 , G06F9/3895 , G06F16/17 , G06N3/044 , G06N3/045 , G06N3/084 , G06T1/20

Abstract: One embodiment provides for a general-purpose graphics processing unit including a scheduler to schedule multiple matrix operations for execution by a general-purpose graphics processing unit. The multiple matrix operations are determined based on a single machine learning compute instruction. The single machine learning compute instruction is a convolution instruction and the multiple matrix operations are associated with a convolution operation.

7.

发明授权
Instructions and logic to perform floating point and integer operations for machine learning 有权

公开(公告)号：US12039331B2

公开(公告)日：2024-07-16

申请号：US17967283

申请日：2022-10-17

Applicant: Intel Corporation

Inventor： Himanshu Kaul , Mark A. Anders , Sanu K. Mathew , Anbang Yao , Joydeep Ray , Ping T. Tang , Michael S. Strickland , Xiaoming Chen , Tatiana Shpeisman , Abhishek R. Appu , Altug Koker , Kamal Sinha , Balaji Vembu , Nicolas C. Galoppo Von Borries , Eriko Nurvitadhi , Rajkishore Barik , Tsung-Han Lin , Vasanth Ranganathan , Sanjeev Jahagirdar

IPC: G06F9/30 , G06F7/483 , G06F7/544 , G06F9/38 , G06N3/044 , G06N3/045 , G06N3/063 , G06N3/08 , G09G5/393 , G06F1/16 , G06N20/00 , G06T15/00

CPC classification number: G06F9/3001 , G06F7/483 , G06F7/5443 , G06F9/30014 , G06F9/30036 , G06F9/3851 , G06N3/044 , G06N3/045 , G06N3/063 , G06N3/08 , G09G5/393 , G06F1/16 , G06F9/30025 , G06F9/3013 , G06F2207/3824 , G06N20/00 , G06T15/005

Abstract: One embodiment provides for a graphics processing unit to accelerate machine-learning operations, the graphics processing unit comprising a multiprocessor having a single instruction, multiple thread (SIMT) architecture, the multiprocessor to execute at least one single instruction; and a first compute unit included within the multiprocessor, the at least one single instruction to cause the first compute unit to perform a two-dimensional matrix multiply and accumulate operation, wherein to perform the two-dimensional matrix multiply and accumulate operation includes to compute an intermediate product of 16-bit operands and to compute a 32-bit sum based on the intermediate product.

8.

发明公开
SECTOR CACHE FOR COMPRESSION 审中-公开

公开(公告)号：US20240232094A1

公开(公告)日：2024-07-11

申请号：US18405933

申请日：2024-01-05

Applicant: Intel Corporation

Inventor： Abhishek R. Appu , Altug Koker , Joydeep Ray , David Puffer , Prasoonkumar Surti , Lakshminarayanan Striramassarma , Vasanth Ranganathan , Kiran C. Veernapu , Balaji Vembu , Pattabhiraman K

IPC: G06F12/0877 , G06F12/0802 , G06F12/0806 , G06F12/0846 , G06F12/0855 , G06F12/0868 , G06F12/0893 , G06F12/126 , G06T1/60

CPC classification number: G06F12/0877 , G06F12/0802 , G06F12/0806 , G06F12/0848 , G06F12/0855 , G06F12/0868 , G06F12/126 , G06T1/60 , G06F12/0893

Abstract: One embodiment provides circuitry coupled with cache memory and a memory interface, the circuitry to compress compute data at multiple cache line granularity, and a processing resource coupled with the memory interface and the cache memory. The processing resource is configured to perform a general-purpose compute operation on compute data associated with multiple cache lines of the cache memory. The circuitry is configured to compress the compute data before a write of the compute data via the memory interface to the memory bus, in association with a read of the compute data associated with the multiple cache lines via the memory interface, decompress the compute data, and provide the decompressed compute data to the processing resource.

9.

发明公开
MACHINE LEARNING SPARSE COMPUTATION MECHANISM 审中-公开

公开(公告)号：US20240078629A1

公开(公告)日：2024-03-07

申请号：US18466991

申请日：2023-09-14

Applicant: Intel Corporation

Inventor： Eriko Nurvitadhi , Balaji Vembu , Tsung-Han Lin , Kamal Sinha , Rajkishore Barik , Nicolas C. Galoppo Von Borries

IPC: G06T1/20 , G06F9/30 , G06F9/38 , G06F9/48 , G06F12/02 , G06F12/0811 , G06F12/0815 , G06F12/0831 , G06F12/0888 , G06F17/16 , G06F18/2136 , G06N3/04 , G06N3/08 , G06N20/00 , G06T1/60 , G06T15/00 , H03M7/30

CPC classification number: G06T1/20 , G06F9/3001 , G06F9/3885 , G06F9/4881 , G06F12/0207 , G06F12/0811 , G06F12/0815 , G06F12/0831 , G06F12/0888 , G06F17/16 , G06F18/2136 , G06N3/04 , G06N3/08 , G06N20/00 , G06T1/60 , G06T15/005 , H03M7/30 , G06F2212/1024 , G06F2212/302 , G06F2212/401 , G06F2212/621 , G06T2200/28

Abstract: Techniques to improve performance of matrix multiply operations are described in which a compute kernel can specify one or more element-wise operations to perform on output of the compute kernel before the output is transferred to higher levels of a processor memory hierarchy.

10.

发明授权
Compute optimization mechanism for deep neural networks 有权

公开(公告)号：US11922535B2

公开(公告)日：2024-03-05

申请号：US18168207

申请日：2023-02-13

Applicant: Intel Corporation

Inventor： Prasoonkumar Surti , Narayan Srinivasa , Feng Chen , Joydeep Ray , Ben J. Ashbaugh , Nicolas C. Galoppo Von Borries , Eriko Nurvitadhi , Balaji Vembu , Tsung-Han Lin , Kamal Sinha , Rajkishore Barik , Sara S. Baghsorkhi , Justin E. Gottschlich , Altug Koker , Nadathur Rajagopalan Satish , Farshad Akhbari , Dukhwan Kim , Wenyin Fu , Travis T. Schluessler , Josh B. Mastronarde , Linda L. Hurd , John H. Feit , Jeffery S. Boles , Adam T. Lake , Karthik Vaidyanathan , Devan Burke , Subramaniam Maiyuran , Abhishek R. Appu

IPC: G06T1/20 , G06F9/455 , G06F9/50 , G06N3/044 , G06N3/045 , G06N3/063 , G06N3/084 , G06F8/41

CPC classification number: G06T1/20 , G06F9/45533 , G06F9/5061 , G06F9/5094 , G06N3/044 , G06N3/045 , G06N3/063 , G06N3/084 , G06F8/41 , G06F2009/45583

Abstract: Embodiments provide mechanisms to facilitate compute operations for deep neural networks. One embodiment comprises a graphics processing unit comprising one or more multiprocessors, at least one of the one or more multiprocessors including a register file to store a plurality of different types of operands and a plurality of processing cores. The plurality of processing cores includes a first set of processing cores of a first type and a second set of processing cores of a second type. The first set of processing cores are associated with a first memory channel and the second set of processing cores are associated with a second memory channel.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification