Patent search ap:("Intel Corporation") AND inv:"Balaji Vembu" Page 32

311.

发明授权
Graphics processor with encrypted kernels 有权

公开(公告)号：US11018863B2

公开(公告)日：2021-05-25

申请号：US16435083

申请日：2019-06-07

Applicant: Intel Corporation

Inventor： Balaji Vembu , Vidhya Krishnan , Sandeep S. Sodhi , Scott Janus , Daniel Nemiroff

IPC: H04L9/14 , G06F21/74 , G06F21/75

Abstract: An embodiment of a graphics apparatus may include a graphics processor including a kernel executor, and a security engine communicatively coupled to the graphics processor. The security engine may be configured to create a kernel security key, encrypt an executable kernel for the kernel executor in accordance with the kernel security key, and share the kernel security key with the graphics processor.

312.

发明申请
INSTRUCTIONS AND LOGIC TO PERFORM FLOATING POINT AND INTEGER OPERATIONS FOR MACHINE LEARNING 有权

公开(公告)号：US20210124579A1

公开(公告)日：2021-04-29

申请号：US17115989

申请日：2020-12-09

Applicant: Intel Corporation

Inventor： Himanshu Kaul , Mark A. Anders , Sanu K. Mathew , Anbang Yao , Joydeep Ray , Ping T. Tang , Michael S. Strickland , Xiaoming Chen , Tatiana Shpeisman , Abhishek R. Appu , Altug Koker , Kamal Sinha , Balaji Vembu , Nicolas C. Galoppo Von Borries , Eriko Nurvitadhi , Rajkishore Barik , Tsung-Han Lin , Vasanth Ranganathan , Sanjeev Jahagirdar

IPC: G06F9/30 , G09G5/393 , G06F9/38 , G06F7/483 , G06F7/544 , G06N3/04 , G06N3/063 , G06N3/08

Abstract: One embodiment provides for a graphics processing unit to accelerate machine-learning operations, the graphics processing unit comprising a multiprocessor having a single instruction, multiple thread (SIMT) architecture, the multiprocessor to execute at least one single instruction; and a first compute unit included within the multiprocessor, the at least one single instruction to cause the first compute unit to perform a two-dimensional matrix multiply and accumulate operation, wherein to perform the two-dimensional matrix multiply and accumulate operation includes to compute a 32-bit intermediate product of 16-bit operands and to compute a 32-bit sum based on the 32-bit intermediate product.

313.

发明授权
Handling pipeline submissions across many compute units 有权

公开(公告)号：US10977762B2

公开(公告)日：2021-04-13

申请号：US16834902

申请日：2020-03-30

Applicant: Intel Corporation

Inventor： Balaji Vembu , Altug Koker , Joydeep Ray

IPC: G06T1/20 , G06T15/00

Abstract: One embodiment provides for a general-purpose graphics processing unit multiple processing elements having a single instruction, multiple thread (SIMT) architecture configured to perform hardware multithreading during execution of a plurality of thread groups. The plurality of thread groups can include one or more sub-groups of threads, with a first sub-group is associated with a first thread group and a second sub-group associated with a second thread group. Data dependencies can be used to trigger the launch of threads, such that when a first thread in the second sub-group has a data dependency upon a first thread in the first sub-group, circuitry in the general-purpose graphics processing unit can launch at least the first thread in the second sub-group to execute in response to satisfaction of the data dependency.

314.

发明授权
Apparatus and method for dynamic provisioning, quality of service, and prioritization in a graphics processor 有权

公开(公告)号：US10937123B2

公开(公告)日：2021-03-02

申请号：US16505555

申请日：2019-07-08

Applicant: INTEL CORPORATION

Inventor： Abhishek R. Appu , Joydeep Ray , Altug Koker , Balaji Vembu , Pattabhiraman K , Matthew B. Callaway

IPC: G06F13/14 , G06T1/60 , G06T15/00 , G06F9/455 , G06F9/50 , G06F9/48

Abstract: An apparatus and method for dynamic provisioning, quality of service, and prioritization in a graphics processor. For example, one embodiment of an apparatus comprises a graphics processing unit (GPU) comprising a plurality of graphics processing resources; slice configuration hardware logic to logically subdivide the graphics processing resources into a plurality of slices; and slice allocation hardware logic to allocate a designated number of slices to each virtual machine (VM) of a plurality of VMs running in a virtualized execution environment, the slice allocation hardware logic to allocate different numbers of slices to different VMs based on graphics processing requirements and/or priorities of each of the VMs.

315.

发明申请
PROGRAMMABLE COARSE GRAINED AND SPARSE MATRIX COMPUTE HARDWARE WITH ADVANCED SCHEDULING 有权

公开(公告)号：US20210035255A1

公开(公告)日：2021-02-04

申请号：US16928353

申请日：2020-07-14

Applicant: Intel Corporation

Inventor： Eriko Nurvitadhi , Balaji Vembu , Nicolas C. Galoppo Von Borries , Rajkishore Barik , Tsung-Han Lin , Kamal Sinha , Nadathur Rajagopalan Satish , Jeremy Bottleson , Farshad Akhbari , Altug Koker , Narayan Srinivasa , Dukhwan Kim , Sara S. Baghsorkhi , Justin E. Gottschlich , Feng Chen , Elmoustapha Ould-Ahmed-Vall , Kevin Nealis , Xiaoming Chen , Anbang Yao

IPC: G06T1/20 , G06N3/04 , G06N3/063 , G06F9/38 , G06F9/30 , G06N3/08

Abstract: One embodiment provides for a compute apparatus to perform machine learning operations, the compute apparatus comprising a decode unit to decode a single instruction into a decoded instruction, the decoded instruction to cause the compute apparatus to perform a complex machine learning compute operation.

316.

发明授权
Compute optimization mechanism for deep neural networks 有权

公开(公告)号：US10902547B2

公开(公告)日：2021-01-26

申请号：US15819093

申请日：2017-11-21

Applicant: Intel Corporation

Inventor： Prasoonkumar Surti , Narayan Srinivasa , Feng Chen , Joydeep Ray , Ben J. Ashbaugh , Nicolas C. Galoppo Von Borries , Eriko Nurvitadhi , Balaji Vembu , Tsung-Han Lin , Kamal Sinha , Rajkishore Barik , Sara S. Baghsorkhi , Justin E. Gottschlich , Altug Koker , Nadathur Rajagopalan Satish , Farshad Akhbari , Dukhwan Kim , Wenyin Fu , Travis T. Schluessler , Josh B. Mastronarde , Linda L. Hurd , John H. Feit , Jeffery S. Boles , Adam T. Lake , Karthik Vaidyanathan , Devan Burke , Subramaniam Maiyuran , Abhishek R. Appu

IPC: G06T1/20 , G06N3/04 , G06F9/455 , G06F9/50 , G06N3/063 , G06N3/08 , G06F8/41

Abstract: An apparatus to facilitate compute optimization is disclosed. The apparatus includes a plurality of processing units each comprising a plurality of execution units (EUs), wherein the plurality of EUs comprise a first EU type and a second EU type.

317.

发明授权
Handling pipeline submissions across many compute units 有权

公开(公告)号：US10896479B2

公开(公告)日：2021-01-19

申请号：US16446946

申请日：2019-06-20

Applicant: Intel Corporation

Inventor： Balaji Vembu , Altug Koker , Joydeep Ray

IPC: G06T1/20 , G06T15/00

Abstract: One embodiment provides for a general-purpose graphics processing unit multiple processing elements having a single instruction, multiple thread (SIMT) architecture, the multiple processing elements to perform hardware multithreading during execution of multiple warps of threads, wherein a warp is a group of parallel threads; a scheduler to schedule a set of sub-warps to the multiple processing elements at sub-warp granularity, wherein a sub-warp is a sub-group of parallel threads, a warp includes multiple sub-warps, and the scheduler is to schedule threads in a first sub-warp of a first warp of threads to execute concurrently with the threads in a second sub-warp of a second warp of threads; and a logic unit including hardware or firmware logic, the logic unit to group active threads for execution on the multiple processing elements.

318.

发明授权
Coordination and increased utilization of graphics processors during inference 有权

公开(公告)号：US10891707B2

公开(公告)日：2021-01-12

申请号：US16377315

申请日：2019-04-08

Applicant: Intel Corporation

Inventor： Abhishek R. Appu , Altug Koker , John C. Weast , Mike B. Macpherson , Linda L. Hurd , Sara S. Baghsorkhi , Justin E. Gottschlich , Prasoonkumar Surti , Chandrasekaran Sakthivel , Liwei Ma , Elmoustapha Ould-Ahmed-Vall , Kamal Sinha , Joydeep Ray , Balaji Vembu , Sanjeev Jahagirdar , Vasanth Ranganathan , Dukhwan Kim

IPC: G06T1/20 , G06F9/46 , G06N3/04 , G06N3/063 , G06N3/08

Abstract: A mechanism is described for facilitating inference coordination and processing utilization for machine learning at autonomous machines. A method of embodiments, as described herein, includes detecting, at training time, information relating to one or more tasks to be performed according to a training dataset relating to a processor including a graphics processor. The method may further include analyzing the information to determine one or more portions of hardware relating to the processor capable of supporting the one or more tasks, and configuring the hardware to pre-select the one or more portions to perform the one or more tasks, while other portions of the hardware remain available for other tasks.

319.

发明授权
Dual path sequential element to reduce toggles in data path 有权

公开(公告)号：US10852806B2

公开(公告)日：2020-12-01

申请号：US16661803

申请日：2019-10-23

Applicant: Intel Corporation

Inventor： Subramaniam Maiyuran , Sanjeev S. Jahagirdar , Kiran C. Veernapu , Eric J. Asperheim , Altug Koker , Balaji Vembu , Joydeep Ray , Abhishek R. Appu

IPC: G06F15/00 , G06F1/3234 , G06F9/46 , G06F1/329

Abstract: Methods and apparatus relating to techniques for a dual path sequential element to reduce toggles in data path are described. In an embodiment, switching logic causes signals for a single data path of a processor to be directed to at least two separate data paths. At least one of the two separate data paths is power gated to reduce signal toggles in the at least one data path. Other embodiments are also disclosed and claimed.

320.

发明授权
Register spill/fill using shared local memory space 有权

公开(公告)号：US10796667B2

公开(公告)日：2020-10-06

申请号：US16599175

申请日：2019-10-11

Applicant: Intel Corporation

Inventor： Joydeep Ray , Altug Koker , Balaji Vembu , Murali Ramadoss , Guei-Yuan Lueh , James A. Valerio , Prasoonkumar Surti , Abhishek R. Appu , Vasanth Ranganathan , Kalyan K. Bhiravabhatla , Arthur D. Hunter, Jr. , Wei-Yu Chen , Subramaniam M. Maiyuran

IPC: G09G5/36 , G06F12/0875 , G06F9/46 , G09G5/00 , G06F12/084 , G06F12/0811

Abstract: A mechanism is described for facilitating using of a shared local memory for register spilling/filling relating to graphics processors at computing devices. A method of embodiments, as described herein, includes reserving one or more spaces of a shared local memory (SLM) to perform one or more of spilling and filling relating to registers associated with a graphics processor of a computing device.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification