Patent search ap:("Intel Corporation") AND inv:"Sanjeev Jahagirdar" Page 1

1.

发明授权
Instructions and logic to perform floating point and integer operations for machine learning 有权

公开(公告)号：US12217053B2

公开(公告)日：2025-02-04

申请号：US18528340

申请日：2023-12-04

Applicant: Intel Corporation

Inventor： Himanshu Kaul , Mark A. Anders , Sanu K. Mathew , Anbang Yao , Joydeep Ray , Ping T. Tang , Michael S. Strickland , Xiaoming Chen , Tatiana Shpeisman , Abhishek R. Appu , Altug Koker , Kamal Sinha , Balaji Vembu , Nicolas C. Galoppo Von Borries , Eriko Nurvitadhi , Rajkishore Barik , Tsung-Han Lin , Vasanth Ranganathan , Sanjeev Jahagirdar

IPC: G06F9/30 , G06F7/483 , G06F7/544 , G06F9/38 , G06N3/044 , G06N3/045 , G06N3/063 , G06N3/08 , G09G5/393 , G06F1/16 , G06F17/16 , G06N20/00 , G06T15/00

Abstract: One embodiment provides for a graphics processing unit to accelerate machine-learning operations, the graphics processing unit comprising a multiprocessor having a single instruction, multiple thread (SIMT) architecture, the multiprocessor to execute at least one single instruction; and a first compute unit included within the multiprocessor, the at least one single instruction to cause the first compute unit to perform a two-dimensional matrix multiply and accumulate operation, wherein to perform the two-dimensional matrix multiply and accumulate operation includes to compute an intermediate product of 16-bit operands and to compute a 32-bit sum based on the intermediate product.

2.

发明授权
Enabling product SKUs based on chiplet configurations 有权

公开(公告)号：US12141890B2

公开(公告)日：2024-11-12

申请号：US17685117

申请日：2022-03-02

Applicant: Intel Corporation

Inventor： Altug Koker , Lance Cheney , Eric Finley , Varghese George , Sanjeev Jahagirdar , Josh Mastronarde , Naveen Matam , Iqbal Rajwani , Lakshminarayanan Striramassarma , Melaku Teshome , Vikranth Vemulapalli , Binoj Xavier

IPC: G06T1/20 , G06F13/40

Abstract: A disaggregated processor package can be configured to accept interchangeable chiplets. Interchangeability is enabled by specifying a standard physical interconnect for chiplets that can enable the chiplet to interface with a fabric or bridge interconnect. Chiplets from different IP designers can conform to the common interconnect, enabling such chiplets to be interchangeable during assembly. The fabric and bridge interconnects logic on the chiplet can then be configured to confirm with the actual interconnect layout of the on-board logic of the chiplet. Additionally, data from chiplets can be transmitted across an inter-chiplet fabric using encapsulation, such that the actual data being transferred is opaque to the fabric, further enable interchangeability of the individual chiplets. With such an interchangeable design, cache or DRAM memory can be inserted into memory chiplet slots, while compute or graphics chiplets with a higher or lower core count can be inserted into logic chiplet slots.

3.

发明授权
Systems and methods for cache optimization 有权

公开(公告)号：US12056059B2

公开(公告)日：2024-08-06

申请号：US17590362

申请日：2022-02-01

Applicant: Intel Corporation

Inventor： Altug Koker , Joydeep Ray , Elmoustapha Ould-Ahmed-Vall , Abhishek Appu , Aravindh Anantaraman , Valentin Andrei , Durgaprasad Bilagi , Varghese George , Brent Insko , Sanjeev Jahagirdar , Scott Janus , Pattabhiraman K , SungYe Kim , Subramaniam Maiyuran , Vasanth Ranganathan , Lakshminarayanan Striramassarma , Xinmin Tian

IPC: G06F12/00 , G06F12/0875 , G06F12/0891 , G06F12/123 , G06T1/60

CPC classification number: G06F12/123 , G06F12/0875 , G06F12/0891 , G06T1/60 , G06F2212/302

Abstract: Systems and methods for cache utilization are disclosed. In one embodiment, a graphics processor includes processing resources to perform graphics operations and a cache controller of a cache memory that is coupled to the processing resources. The cache controller is configured to set an initial aging policy using an aging field based on age of cache lines within the cache memory and to determine whether a hint or an instruction to indicate a level of aging has been received. In one embodiment, the cache memory configured to be partitioned into multiple cache regions, wherein the multiple cache regions include a first cache region having a cache eviction policy with a configurable level of data persistence.

4.

发明授权
Barriers and synchronization for machine learning at autonomous machines 有权

公开(公告)号：US12001209B2

公开(公告)日：2024-06-04

申请号：US17750917

申请日：2022-05-23

Applicant: Intel Corporation

Inventor： Abhishek R. Appu , Altug Koker , Joydeep Ray , Balaji Vembu , John C. Weast , Mike B. Macpherson , Dukhwan Kim , Linda L. Hurd , Sanjeev Jahagirdar , Vasanth Ranganathan

IPC: G06F9/48 , G05D1/00 , G06F9/52 , G06N3/04 , G06N3/044 , G06N3/045 , G06N3/063 , G06N3/08 , G06N3/084 , G06F9/46 , G06T1/20

CPC classification number: G05D1/0088 , G06F9/4881 , G06F9/522 , G06N3/044 , G06N3/045 , G06N3/063 , G06N3/084 , G06F9/46 , G06T1/20

Abstract: A method of embodiments, as described herein, includes detecting thread groups relating to machine learning associated with one or more processing devices. The method may further include facilitating barrier synchronization of the thread groups across multiple dies such that each thread in a thread group is scheduled across a set of compute elements associated with the multiple dies, where each die represents a processing device of the one or more processing devices, the processing device including a graphics processor.

5.

发明授权
Dynamic power budget allocation in multi-processor system 有权

公开(公告)号：US11874715B2

公开(公告)日：2024-01-16

申请号：US17966151

申请日：2022-10-14

Applicant: Intel Corporation

Inventor： Nikos Kaburlasos , Iqbal Rajwani , Bhushan Borole , Kamal Sinha , Sanjeev Jahagirdar

IPC: G06F1/00 , G06F1/28 , G06F1/3203 , G06F1/3206

CPC classification number: G06F1/28 , G06F1/3203 , G06F1/3206

Abstract: Dynamic power budget allocation in a multi-processor system is described. In an example, an apparatus includes a plurality of processor units; and a power control component, the power control component to monitor power utilization of each of the plurality of processor units, wherein power consumed by the plurality of processor units is limited by a global power budget. The apparatus is to assign a workload to each of the processor units and is to establish an initial power budget for operation of each of the processor units, and, upon the apparatus determining that one or more processor units require an increased power budget based on one or more criteria, the apparatus is to dynamically reallocate an amount of the global power budget to the one or more processor units.

6.

发明授权
Coordination and increased utilization of graphics processors during inference 有权

公开(公告)号：US11748841B2

公开(公告)日：2023-09-05

申请号：US17871781

申请日：2022-07-22

Applicant: Intel Corporation

Inventor： Abhishek R. Appu , Altug Koker , John C. Weast , Mike B. Macpherson , Linda L. Hurd , Sara S. Baghsorkhi , Justin E. Gottschlich , Prasoonkumar Surti , Chandrasekaran Sakthivel , Liwei Ma , Elmoustapha Ould-Ahmed-Vall , Kamal Sinha , Joydeep Ray , Balaji Vembu , Sanjeev Jahagirdar , Vasanth Ranganathan , Dukhwan Kim

IPC: G06T1/20 , G06N3/063 , G06F9/46 , G06N3/045 , G06N3/08 , G06N3/084 , G06N3/044

CPC classification number: G06T1/20 , G06F9/46 , G06N3/045 , G06N3/063 , G06N3/08 , G06N3/044 , G06N3/084

Abstract: A mechanism is described for facilitating inference coordination and processing utilization for machine learning. A method of embodiments, as described herein, includes limiting execution of workloads for the respective contexts of a plurality of contexts to a specified subset of a plurality of processing resources of a processing system according to physical resource slices of the processing system that are associated with the respective contexts of the plurality of contexts.

7.

发明申请
INSTRUCTIONS AND LOGIC TO PERFORM FLOATING POINT AND INTEGER OPERATIONS FOR MACHINE LEARNING 有权

公开(公告)号：US20220357945A1

公开(公告)日：2022-11-10

申请号：US17834482

申请日：2022-06-07

Applicant: Intel Corporation

Inventor： Himanshu Kaul , Mark A. Anders , Sanu K. Mathew , Anbang Yao , Joydeep Ray , Ping T. Tang , Michael S. Strickland , Xiaoming Chen , Tatiana Shpeisman , Abhishek R. Appu , Altug Koker , Kamal Sinha , Balaji Vembu , Nicolas C. Galoppo Von Borries , Eriko Nurvitadhi , Rajkishore Barik , Tsung-Han Lin , Vasanth Ranganathan , Sanjeev Jahagirdar

IPC: G06F9/30 , G09G5/393 , G06F9/38 , G06F7/483 , G06F7/544 , G06N3/04 , G06N3/063 , G06N3/08

Abstract: One embodiment provides a graphics processor comprising a memory controller and a graphics processing resource coupled with the memory controller. The graphics processing resource includes circuitry configured to execute an instruction to perform a matrix operation on first input including weight data and second input including input activation data, generate intermediate data based on a result of the matrix operation, quantize the intermediate data to a floating-point format determined based on a statistical distribution of first output data, and output, as second output data, quantized intermediate data in a determined floating-point format.

8.

发明授权
Apparatus and method of balancing input power from multiple sources 有权

公开(公告)号：US11474547B2

公开(公告)日：2022-10-18

申请号：US16792073

申请日：2020-02-14

Applicant: Intel Corporation

Inventor： Darryl Tschirhart , Alan Wu , Jason Lee Pack , Yvan Large , Sanjeev Jahagirdar

IPC: H03M1/34 , G05F1/565 , H02J1/10 , G05F1/575 , G06F1/28

Abstract: A scheme is provided for dynamically adjusting an amount of power drawn from individual power sources to optimize the power usage without violating power limits. Coarse adjustment is provided through dynamic phase reallocation while a fine adjustment is provided through dynamic current steering. By adding a control loop around current steering techniques in digital voltage regulator controllers, power drawn from multiple input rails is balanced. The apparatus allows users to maximize the power delivered to discrete graphics cards without violating PCIe specifications. This allows maximum performance with minimal bill-of-material (BOM) cost.

9.

发明授权
Coordination and increased utilization of graphics processors during inference 有权

公开(公告)号：US11430082B2

公开(公告)日：2022-08-30

申请号：US17143805

申请日：2021-01-07

Applicant: Intel Corporation

Inventor： Abhishek R. Appu , Altug Koker , John C. Weast , Mike B. Macpherson , Linda L. Hurd , Sara S. Baghsorkhi , Justin E. Gottschlich , Prasoonkumar Surti , Chandrasekaran Sakthivel , Liwei Ma , Elmoustapha Ould-Ahmed-Vall , Kamal Sinha , Joydeep Ray , Balaji Vembu , Sanjeev Jahagirdar , Vasanth Ranganathan , Dukhwan Kim

IPC: G06T1/20 , G06F9/46 , G06N3/04 , G06N3/063 , G06N3/08

Abstract: A mechanism is described for facilitating inference coordination and processing utilization for machine learning at autonomous machines. A method of embodiments, as described herein, includes detecting, at training time, information relating to one or more tasks to be performed according to a training dataset relating to a processor including a graphics processor. The method may further include analyzing the information to determine one or more portions of hardware relating to the processor capable of supporting the one or more tasks, and configuring the hardware to pre-select the one or more portions to perform the one or more tasks, while other portions of the hardware remain available for other tasks.

10.

发明授权
Intelligent thread dispatch and vectorization of atomic operations 有权

公开(公告)号：US11379235B2

公开(公告)日：2022-07-05

申请号：US17128972

申请日：2020-12-21

Applicant: Intel Corporation

Inventor： Feng Chen , Narayan Srinivasa , Abhishek R. Appu , Altug Koker , Kamal Sinha , Balaji Vembu , Joydeep Ray , Nicolas C. Galoppo Von Borries , Prasoonkumar Surti , Ben J. Ashbaugh , Sanjeev Jahagirdar , Vasanth Ranganathan

IPC: G06T15/00 , G06F9/30 , G06F9/38 , G06F12/0862 , G06F9/50 , G06N3/04 , G06N3/08 , G06N3/063

Abstract: A mechanism is described for facilitating intelligent dispatching and vectorizing at autonomous machines. A method of embodiments, as described herein, includes detecting a plurality of threads corresponding to a plurality of workloads associated with tasks relating to a graphics processor. The method may further include determining a first set of threads of the plurality of threads that are similar to each other or have adjacent surfaces, and physically clustering the first set of threads close together using a first set of adjacent compute blocks.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification