-
公开(公告)号:US11018863B2
公开(公告)日:2021-05-25
申请号:US16435083
申请日:2019-06-07
Applicant: Intel Corporation
Inventor: Balaji Vembu , Vidhya Krishnan , Sandeep S. Sodhi , Scott Janus , Daniel Nemiroff
Abstract: An embodiment of a graphics apparatus may include a graphics processor including a kernel executor, and a security engine communicatively coupled to the graphics processor. The security engine may be configured to create a kernel security key, encrypt an executable kernel for the kernel executor in accordance with the kernel security key, and share the kernel security key with the graphics processor.
-
312.
公开(公告)号:US20210124579A1
公开(公告)日:2021-04-29
申请号:US17115989
申请日:2020-12-09
Applicant: Intel Corporation
Inventor: Himanshu Kaul , Mark A. Anders , Sanu K. Mathew , Anbang Yao , Joydeep Ray , Ping T. Tang , Michael S. Strickland , Xiaoming Chen , Tatiana Shpeisman , Abhishek R. Appu , Altug Koker , Kamal Sinha , Balaji Vembu , Nicolas C. Galoppo Von Borries , Eriko Nurvitadhi , Rajkishore Barik , Tsung-Han Lin , Vasanth Ranganathan , Sanjeev Jahagirdar
Abstract: One embodiment provides for a graphics processing unit to accelerate machine-learning operations, the graphics processing unit comprising a multiprocessor having a single instruction, multiple thread (SIMT) architecture, the multiprocessor to execute at least one single instruction; and a first compute unit included within the multiprocessor, the at least one single instruction to cause the first compute unit to perform a two-dimensional matrix multiply and accumulate operation, wherein to perform the two-dimensional matrix multiply and accumulate operation includes to compute a 32-bit intermediate product of 16-bit operands and to compute a 32-bit sum based on the 32-bit intermediate product.
-
公开(公告)号:US10977762B2
公开(公告)日:2021-04-13
申请号:US16834902
申请日:2020-03-30
Applicant: Intel Corporation
Inventor: Balaji Vembu , Altug Koker , Joydeep Ray
Abstract: One embodiment provides for a general-purpose graphics processing unit multiple processing elements having a single instruction, multiple thread (SIMT) architecture configured to perform hardware multithreading during execution of a plurality of thread groups. The plurality of thread groups can include one or more sub-groups of threads, with a first sub-group is associated with a first thread group and a second sub-group associated with a second thread group. Data dependencies can be used to trigger the launch of threads, such that when a first thread in the second sub-group has a data dependency upon a first thread in the first sub-group, circuitry in the general-purpose graphics processing unit can launch at least the first thread in the second sub-group to execute in response to satisfaction of the data dependency.
-
314.
公开(公告)号:US10937123B2
公开(公告)日:2021-03-02
申请号:US16505555
申请日:2019-07-08
Applicant: INTEL CORPORATION
Inventor: Abhishek R. Appu , Joydeep Ray , Altug Koker , Balaji Vembu , Pattabhiraman K , Matthew B. Callaway
Abstract: An apparatus and method for dynamic provisioning, quality of service, and prioritization in a graphics processor. For example, one embodiment of an apparatus comprises a graphics processing unit (GPU) comprising a plurality of graphics processing resources; slice configuration hardware logic to logically subdivide the graphics processing resources into a plurality of slices; and slice allocation hardware logic to allocate a designated number of slices to each virtual machine (VM) of a plurality of VMs running in a virtualized execution environment, the slice allocation hardware logic to allocate different numbers of slices to different VMs based on graphics processing requirements and/or priorities of each of the VMs.
-
315.
公开(公告)号:US20210035255A1
公开(公告)日:2021-02-04
申请号:US16928353
申请日:2020-07-14
Applicant: Intel Corporation
Inventor: Eriko Nurvitadhi , Balaji Vembu , Nicolas C. Galoppo Von Borries , Rajkishore Barik , Tsung-Han Lin , Kamal Sinha , Nadathur Rajagopalan Satish , Jeremy Bottleson , Farshad Akhbari , Altug Koker , Narayan Srinivasa , Dukhwan Kim , Sara S. Baghsorkhi , Justin E. Gottschlich , Feng Chen , Elmoustapha Ould-Ahmed-Vall , Kevin Nealis , Xiaoming Chen , Anbang Yao
Abstract: One embodiment provides for a compute apparatus to perform machine learning operations, the compute apparatus comprising a decode unit to decode a single instruction into a decoded instruction, the decoded instruction to cause the compute apparatus to perform a complex machine learning compute operation.
-
公开(公告)号:US10902547B2
公开(公告)日:2021-01-26
申请号:US15819093
申请日:2017-11-21
Applicant: Intel Corporation
Inventor: Prasoonkumar Surti , Narayan Srinivasa , Feng Chen , Joydeep Ray , Ben J. Ashbaugh , Nicolas C. Galoppo Von Borries , Eriko Nurvitadhi , Balaji Vembu , Tsung-Han Lin , Kamal Sinha , Rajkishore Barik , Sara S. Baghsorkhi , Justin E. Gottschlich , Altug Koker , Nadathur Rajagopalan Satish , Farshad Akhbari , Dukhwan Kim , Wenyin Fu , Travis T. Schluessler , Josh B. Mastronarde , Linda L. Hurd , John H. Feit , Jeffery S. Boles , Adam T. Lake , Karthik Vaidyanathan , Devan Burke , Subramaniam Maiyuran , Abhishek R. Appu
Abstract: An apparatus to facilitate compute optimization is disclosed. The apparatus includes a plurality of processing units each comprising a plurality of execution units (EUs), wherein the plurality of EUs comprise a first EU type and a second EU type.
-
公开(公告)号:US10896479B2
公开(公告)日:2021-01-19
申请号:US16446946
申请日:2019-06-20
Applicant: Intel Corporation
Inventor: Balaji Vembu , Altug Koker , Joydeep Ray
Abstract: One embodiment provides for a general-purpose graphics processing unit multiple processing elements having a single instruction, multiple thread (SIMT) architecture, the multiple processing elements to perform hardware multithreading during execution of multiple warps of threads, wherein a warp is a group of parallel threads; a scheduler to schedule a set of sub-warps to the multiple processing elements at sub-warp granularity, wherein a sub-warp is a sub-group of parallel threads, a warp includes multiple sub-warps, and the scheduler is to schedule threads in a first sub-warp of a first warp of threads to execute concurrently with the threads in a second sub-warp of a second warp of threads; and a logic unit including hardware or firmware logic, the logic unit to group active threads for execution on the multiple processing elements.
-
公开(公告)号:US10891707B2
公开(公告)日:2021-01-12
申请号:US16377315
申请日:2019-04-08
Applicant: Intel Corporation
Inventor: Abhishek R. Appu , Altug Koker , John C. Weast , Mike B. Macpherson , Linda L. Hurd , Sara S. Baghsorkhi , Justin E. Gottschlich , Prasoonkumar Surti , Chandrasekaran Sakthivel , Liwei Ma , Elmoustapha Ould-Ahmed-Vall , Kamal Sinha , Joydeep Ray , Balaji Vembu , Sanjeev Jahagirdar , Vasanth Ranganathan , Dukhwan Kim
Abstract: A mechanism is described for facilitating inference coordination and processing utilization for machine learning at autonomous machines. A method of embodiments, as described herein, includes detecting, at training time, information relating to one or more tasks to be performed according to a training dataset relating to a processor including a graphics processor. The method may further include analyzing the information to determine one or more portions of hardware relating to the processor capable of supporting the one or more tasks, and configuring the hardware to pre-select the one or more portions to perform the one or more tasks, while other portions of the hardware remain available for other tasks.
-
公开(公告)号:US10852806B2
公开(公告)日:2020-12-01
申请号:US16661803
申请日:2019-10-23
Applicant: Intel Corporation
Inventor: Subramaniam Maiyuran , Sanjeev S. Jahagirdar , Kiran C. Veernapu , Eric J. Asperheim , Altug Koker , Balaji Vembu , Joydeep Ray , Abhishek R. Appu
IPC: G06F15/00 , G06F1/3234 , G06F9/46 , G06F1/329
Abstract: Methods and apparatus relating to techniques for a dual path sequential element to reduce toggles in data path are described. In an embodiment, switching logic causes signals for a single data path of a processor to be directed to at least two separate data paths. At least one of the two separate data paths is power gated to reduce signal toggles in the at least one data path. Other embodiments are also disclosed and claimed.
-
公开(公告)号:US10796667B2
公开(公告)日:2020-10-06
申请号:US16599175
申请日:2019-10-11
Applicant: Intel Corporation
Inventor: Joydeep Ray , Altug Koker , Balaji Vembu , Murali Ramadoss , Guei-Yuan Lueh , James A. Valerio , Prasoonkumar Surti , Abhishek R. Appu , Vasanth Ranganathan , Kalyan K. Bhiravabhatla , Arthur D. Hunter, Jr. , Wei-Yu Chen , Subramaniam M. Maiyuran
IPC: G09G5/36 , G06F12/0875 , G06F9/46 , G09G5/00 , G06F12/084 , G06F12/0811
Abstract: A mechanism is described for facilitating using of a shared local memory for register spilling/filling relating to graphics processors at computing devices. A method of embodiments, as described herein, includes reserving one or more spaces of a shared local memory (SLM) to perform one or more of spilling and filling relating to registers associated with a graphics processor of a computing device.
-
-
-
-
-
-
-
-
-