-
公开(公告)号:US10824938B2
公开(公告)日:2020-11-03
申请号:US15494723
申请日:2017-04-24
Applicant: Intel Corporation
Inventor: Rajkishore Barik , Elmoustapha Ould-Ahmed-Vall , Xiaoming Chen , Dhawal Srivastava , Anbang Yao , Kevin Nealis , Eriko Nurvitadhi , Sara S. Baghsorkhi , Balaji Vembu , Tatiana Shpeisman , Ping T. Tang
Abstract: One embodiment provides for a compute apparatus to perform machine learning operations, the apparatus comprising a decode unit to decode a single instruction into a decoded instruction, the decoded instruction to perform one or more machine learning operations, wherein the decode unit, based on parameters of the one or more machine learning operations, is to request a scheduler to schedule the one or more machine learning operations to one of an array of programmable compute units and a fixed function compute unit.
-
公开(公告)号:US10824529B2
公开(公告)日:2020-11-03
申请号:US15857885
申请日:2017-12-29
Applicant: Intel Corporation
Inventor: Prashant Chaudhari , Michael Derr , Gustavo Espinosa , Balaji Vembu , Richard Shannon , Bradley Coffman , Daniel Knollmueller
IPC: G06F11/22 , G06F11/263
Abstract: Systems, apparatuses and methods may provide for technology that detects a startup of a system on chip (SoC) and injects, during the startup, one or more domain startup errors into a plurality of domains on the SoC. Additionally, the technology may determine whether the domain startup error(s) were detected during the startup. In one example, the plurality of domains include one or more fabric interfaces.
-
公开(公告)号:US20200327637A1
公开(公告)日:2020-10-15
申请号:US16791482
申请日:2020-02-14
Applicant: Intel Corporation
Inventor: Balaji Vembu , Murali Ramadoss , Guei-Yuan Lueh , Subramaniam M. Maiyuran , Abhishek R. Appu , Joydeep Ray , Altug Koker , James A. Valerio , Eric J. Hoekstra , Arthur D. Hunter, JR.
Abstract: An apparatus to facilitate data intelligent dispatching is disclosed. The apparatus includes one or more processing units including a plurality of execution units (EUs) to execute a plurality of processing threads and collection logic to collect statistics data for threads executed at the processing unit during execution of an application, and dispatch logic to dispatch the threads to be executed at a subset of the plurality of EUs during a subsequent execution of the application based on the statistics data.
-
公开(公告)号:US20200258186A1
公开(公告)日:2020-08-13
申请号:US16834902
申请日:2020-03-30
Applicant: Intel Corporation
Inventor: Balaji Vembu , Altug Koker , Joydeep Ray
Abstract: One embodiment provides for a general-purpose graphics processing unit multiple processing elements having a single instruction, multiple thread (SIMT) architecture configured to perform hardware multithreading during execution of a plurality of thread groups. The plurality of thread groups can include one or more sub-groups of threads, with a first sub-group is associated with a first thread group and a second sub-group associated with a second thread group. Data dependencies can be used to trigger the launch of threads, such that when a first thread in the second sub-group has a data dependency upon a first thread in the first sub-group, circuitry in the general-purpose graphics processing unit can launch at least the first thread in the second sub-group to execute in response to satisfaction of the data dependency.
-
公开(公告)号:US20200219223A1
公开(公告)日:2020-07-09
申请号:US16243624
申请日:2019-01-09
Applicant: Intel Corporation
Inventor: Balaji Vembu , Brandon Fliflet , James Valerio , Michael Apodaca , Ben Ashbaugh , Hema Nalluri , Ankur Shah , Murali Ramadoss , David Puffer , Altug Koker , Aditya Navale , Abhishek R. Appu , Joydeep Ray , Travis Schluessler
Abstract: Embodiments described herein provide a graphics, media, and compute device having a tiled architecture composed of a number of tiles of smaller graphics devices. The work distribution infrastructure for such device enables the distribution of workloads across multiple tiles of the device. Work items can be submitted to any one or more of the multiple tiles, with workloads able to span multiple tiles. Additionally, upon completion of a work item, graphics, media, and/or compute engines within the device can readily acquire new work items for execution with minimal latency.
-
公开(公告)号:US10706498B2
公开(公告)日:2020-07-07
申请号:US16417132
申请日:2019-05-20
Applicant: Intel Corporation
Inventor: Eriko Nurvitadhi , Balaji Vembu , Tsung-Han Lin , Kamal Sinha , Rajikshore Barik , Nicolas C. Galoppo Von Borries
IPC: G06F17/16 , H03M7/30 , G06K9/62 , G06T1/20 , G06F9/30 , G06F9/38 , G06F12/0811 , G06F12/0815 , G06F12/0831 , G06F12/0888 , G06F9/48 , G06N3/04 , G06N3/08 , G06T1/60 , G06T15/00
Abstract: An apparatus to facilitate processing of a sparse matrix is disclosed. The apparatus includes a plurality of processing units each comprising one or more processing elements, including logic to read operands, a multiplication unit to multiply two or more operands and a scheduler to identify operands having a zero value and prevent scheduling of the operands having the zero value at the multiplication unit.
-
公开(公告)号:US20200210338A1
公开(公告)日:2020-07-02
申请号:US16727127
申请日:2019-12-26
Applicant: Intel Corporation
Inventor: Chandrasekaran Sakthivel , Prasoonkumar Surti , John C. Weast , Sara S. Baghsorkhi , Justin E. Gottschlich , Abhishek R. Appu , Nicolas C. Galoppo Von Borries , Joydeep Ray , Narayan Srinivasa , Feng Chen , Ben J. Ashbaugh , Rajkishore Barik , Tsung-Han Lin , Kamal Sinha , Eriko Nurvitadhi , Balaji Vembu , Altug Koker
IPC: G06F12/0837 , G06N3/08 , G06N20/00 , G06T1/20 , G06F12/0815 , G06N3/063 , G06N3/04
Abstract: In an example, an apparatus comprises a plurality of processing unit cores, a plurality of cache memory modules associated with the plurality of processing unit cores, and a machine learning model communicatively coupled to the plurality of processing unit cores, wherein the plurality of cache memory modules share cache coherency data with the machine learning model. Other embodiments are also disclosed and claimed.
-
58.
公开(公告)号:US20200073810A1
公开(公告)日:2020-03-05
申请号:US16566188
申请日:2019-09-10
Applicant: Intel Corporation
Inventor: Andrew T. Lauritzen , Gabor Liktor , Tomer Bar-On , Hugues Labbe , John G. Gierach , Joydeep Ray , Travis T. Schluessler , John H. Feit , Nikos Kaburlasos , Jacek Kwiatkowski , Abhishek R. Appu , Balaji Vembu , Altug Koker
IPC: G06F12/0862 , G06F12/0875 , G06F9/30 , G06T1/20 , G06F9/38 , G06F12/0855 , G06F12/0811
Abstract: Systems, apparatuses and methods may provide a way to track graphics pipeline operations. More particularly, the systems, apparatuses and methods may provide a way to track operation dependencies between graphics pipeline operations for blocks of pixel samples and stall one or more of the pipeline operations based on the operation dependencies. The systems, apparatuses and methods may further provide cache pre-fetch hardware to monitor processing of blocks of pixel samples and fetch a next block of the pixel samples from the memory into a cache before completion of processing a current block of pixel samples based on one or more of the pipeline operations or a surface state of one or more regions of a screen space.
-
公开(公告)号:US10565675B2
公开(公告)日:2020-02-18
申请号:US16252379
申请日:2019-01-18
Applicant: Intel Corporation
Inventor: Balaji Vembu , Murali Ramadoss , Guei-Yuan Lueh , Subramaniam M. Maiyuran , Abhishek R. Appu , Joydeep Ray , Altug Koker , James A. Valerio , Eric J. Hoekstra , Arthur D. Hunter, Jr.
Abstract: An apparatus to facilitate data intelligent dispatching is disclosed. The apparatus includes one or more processing units including a plurality of execution units (EUs) to execute a plurality of processing threads and collection logic to collect statistics data for threads executed at the processing unit during execution of an application, and dispatch logic to dispatch the threads to be executed at a subset of the plurality of EUs during a subsequent execution of the application based on the statistics data.
-
公开(公告)号:US20200034946A1
公开(公告)日:2020-01-30
申请号:US16531763
申请日:2019-08-05
Applicant: Intel Corporation
Inventor: Prasoonkumar Surti , Narayan Srinivasa , Feng Chen , Joydeep Ray , Ben J. Ashbaugh , Nicolas C. Galoppo Von Borries , Eriko Nurvitadhi , Balaji Vembu , Tsung-Han Lin , Kamal Sinha , Rajkishore Barik , Sara S. Baghsorkhi , Justin E. Gottschlich , Altug Koker , Nadathur Rajagopalan Satish , Farshad Akhbari , Dukhwan Kim , Wenyin Fu , Travis T. Schluessler , Josh B. Mastronarde , Linda L. Hurd , John H. Feit , Jeffery S. Boles , Adam T. Lake , Karthik Vaidyanathan , Devan Burke , Subramaniam Maiyuran , Abhishek R. Appu
Abstract: An apparatus to facilitate compute optimization is disclosed. The apparatus includes a memory device including a first integrated circuit (IC) including a plurality of memory channels and a second IC including a plurality of processing units, each coupled to a memory channel in the plurality of memory channels.
-
-
-
-
-
-
-
-
-