-
71.
公开(公告)号:US20210124579A1
公开(公告)日:2021-04-29
申请号:US17115989
申请日:2020-12-09
申请人: Intel Corporation
发明人: Himanshu Kaul , Mark A. Anders , Sanu K. Mathew , Anbang Yao , Joydeep Ray , Ping T. Tang , Michael S. Strickland , Xiaoming Chen , Tatiana Shpeisman , Abhishek R. Appu , Altug Koker , Kamal Sinha , Balaji Vembu , Nicolas C. Galoppo Von Borries , Eriko Nurvitadhi , Rajkishore Barik , Tsung-Han Lin , Vasanth Ranganathan , Sanjeev Jahagirdar
摘要: One embodiment provides for a graphics processing unit to accelerate machine-learning operations, the graphics processing unit comprising a multiprocessor having a single instruction, multiple thread (SIMT) architecture, the multiprocessor to execute at least one single instruction; and a first compute unit included within the multiprocessor, the at least one single instruction to cause the first compute unit to perform a two-dimensional matrix multiply and accumulate operation, wherein to perform the two-dimensional matrix multiply and accumulate operation includes to compute a 32-bit intermediate product of 16-bit operands and to compute a 32-bit sum based on the 32-bit intermediate product.
-
公开(公告)号:US10909652B2
公开(公告)日:2021-02-02
申请号:US16355303
申请日:2019-03-15
申请人: Intel Corporation
发明人: Altug Koker , Lance Cheney , Eric Finley , Varghese George , Sanjeev Jahagirdar , Josh Mastronarde , Naveen Matam , Iqbal Rajwani , Lakshminarayanan Striramassarma , Melaku Teshome , Vikranth Vemulapalli , Binoj Xavier
摘要: A disaggregated processor package can be configured to accept interchangeable chiplets. Interchangeability is enabled by specifying a standard physical interconnect for chiplets that can enable the chiplet to interface with a fabric or bridge interconnect. Chiplets from different IP designers can conform to the common interconnect, enabling such chiplets to be interchangeable during assembly. The fabric and bridge interconnects logic on the chiplet can then be configured to confirm with the actual interconnect layout of the on-board logic of the chiplet. Additionally, data from chiplets can be transmitted across an inter-chiplet fabric using encapsulation, such that the actual data being transferred is opaque to the fabric, further enable interchangeability of the individual chiplets. With such an interchangeable design, higher or lower density memory can be inserted into memory chiplet slots, while compute or graphics chiplets with a higher or lower core count can be inserted into logic chiplet slots.
-
公开(公告)号:US10891707B2
公开(公告)日:2021-01-12
申请号:US16377315
申请日:2019-04-08
申请人: Intel Corporation
发明人: Abhishek R. Appu , Altug Koker , John C. Weast , Mike B. Macpherson , Linda L. Hurd , Sara S. Baghsorkhi , Justin E. Gottschlich , Prasoonkumar Surti , Chandrasekaran Sakthivel , Liwei Ma , Elmoustapha Ould-Ahmed-Vall , Kamal Sinha , Joydeep Ray , Balaji Vembu , Sanjeev Jahagirdar , Vasanth Ranganathan , Dukhwan Kim
摘要: A mechanism is described for facilitating inference coordination and processing utilization for machine learning at autonomous machines. A method of embodiments, as described herein, includes detecting, at training time, information relating to one or more tasks to be performed according to a training dataset relating to a processor including a graphics processor. The method may further include analyzing the information to determine one or more portions of hardware relating to the processor capable of supporting the one or more tasks, and configuring the hardware to pre-select the one or more portions to perform the one or more tasks, while other portions of the hardware remain available for other tasks.
-
公开(公告)号:US10803548B2
公开(公告)日:2020-10-13
申请号:US16355377
申请日:2019-03-15
申请人: Intel Corporation
发明人: Naveen Matam , Lance Cheney , Eric Finley , Varghese George , Sanjeev Jahagirdar , Altug Koker , Josh Mastronarde , Iqbal Rajwani , Lakshminarayanan Striramassarma , Melaku Teshome , Vikranth Vemulapalli , Binoj Xavier
摘要: Embodiments described herein provide techniques to disaggregate an architecture of a system on a chip integrated circuit into multiple distinct chiplets that can be packaged onto a common chassis. In one embodiment, a graphics processing unit or parallel processor is composed from diverse silicon chiplets that are separately manufactured. A chiplet is an at least partially packaged integrated circuit that includes distinct units of logic that can be assembled with other chiplets into a larger package. A diverse set of chiplets with different IP core logic can be assembled into a single device.
-
75.
公开(公告)号:US10474458B2
公开(公告)日:2019-11-12
申请号:US15787129
申请日:2017-10-18
申请人: Intel Corporation
发明人: Himanshu Kaul , Mark A. Anders , Sanu K. Mathew , Anbang Yao , Joydeep Ray , Ping T. Tang , Michael S. Strickland , Xiaoming Chen , Tatiana Shpeisman , Abhishek R. Appu , Altug Koker , Kamal Sinha , Balaji Vembu , Nicolas C. Galoppo Von Borries , Eriko Nurvitadhi , Rajkishore Barik , Tsung-Han Lin , Vasanth Ranganathan , Sanjeev Jahagirdar
IPC分类号: G09G5/00 , G06F9/30 , G09G5/393 , G06F9/38 , G06F7/483 , G06F7/544 , G06N3/04 , G06N3/063 , G06N3/08 , G06T15/00 , G06N20/00
摘要: One embodiment provides for a machine-learning hardware accelerator comprising a compute unit having an adder and a multiplier that are shared between integer data path and a floating-point datapath, the upper bits of input operands to the multiplier to be gated during floating-point operation.
-
公开(公告)号:US10346166B2
公开(公告)日:2019-07-09
申请号:US15581080
申请日:2017-04-28
申请人: Intel Corporation
发明人: Feng Chen , Narayan Srinivasa , Abhishek R. Appu , Altug Koker , Kamal Sinha , Balaji Vembu , Joydeep Ray , Nicolas C. Galoppo Von Borries , Prasoonkumar Surti , Ben J. Ashbaugh , Sanjeev Jahagirdar , Vasanth Ranganathan
IPC分类号: G06T1/00 , G06F9/30 , G06F9/38 , G06F12/0862 , G06F12/0875 , G06F9/50
摘要: A mechanism is described for facilitating intelligent dispatching and vectorizing at autonomous machines. A method of embodiments, as described herein, includes detecting a plurality of threads corresponding to a plurality of workloads associated with tasks relating to a graphics processor. The method may further include determining a first set of threads of the plurality of threads that are similar to each other or have adjacent surfaces, and physically clustering the first set of threads close together using a first set of adjacent compute blocks.
-
公开(公告)号:US20180314521A1
公开(公告)日:2018-11-01
申请号:US15581080
申请日:2017-04-28
申请人: Intel Corporation
发明人: Feng Chen , Narayan Srinivasa , Abhishek R. Appu , Altug Koker , Kamal Sinha , Balaji Vembu , Joydeep Ray , Nicolas C. Galoppo Von Borries , Prasoonkumar Surti , Ben J. Ashbaugh , Sanjeev Jahagirdar , Vasanth Ranganathan
IPC分类号: G06F9/30 , G06F9/38 , G06F12/0862 , G06F12/0875
CPC分类号: G06F9/3009 , G06F9/30036 , G06F9/30145 , G06F9/3836 , G06F9/3867 , G06F9/3887 , G06F9/5033 , G06F9/5066 , G06F12/0862 , G06F12/0875 , G06F2212/452 , G06F2212/602
摘要: A mechanism is described for facilitating intelligent dispatching and vectorizing at autonomous machines. A method of embodiments, as described herein, includes detecting a plurality of threads corresponding to a plurality of workloads associated with tasks relating to a graphics processor. The method may further include determining a first set of threads of the plurality of threads that are similar to each other or have adjacent surfaces, and physically clustering the first set of threads close together using a first set of adjacent compute blocks.
-
公开(公告)号:US20180307985A1
公开(公告)日:2018-10-25
申请号:US15495112
申请日:2017-04-24
申请人: Intel Corporation
发明人: Abhishek R. Appu , Altug Koker , Joydeep Ray , Balaji Vembu , John C. Weast , Mike B. Macpherson , Dukhwan Kim , Linda L. Hurd , Sanjeev Jahagirdar , Vasanth Ranganathan
CPC分类号: G06N3/08 , G05D1/0088 , G06F9/522 , G06N3/063
摘要: A mechanism is described for facilitating barriers and synchronization for machine learning at autonomous machines. A method of embodiments, as described herein, includes detecting thread groups relating to machine learning associated with one or more processing devices. The method may further include facilitating barrier synchronization of the thread groups across multiple dies such that each thread in a thread group is scheduled across a set of compute elements associated with the multiple dies, where each die represents a processing device of the one or more processing devices, the processing device including a graphics processor.
-
公开(公告)号:US20180293491A1
公开(公告)日:2018-10-11
申请号:US15482798
申请日:2017-04-09
申请人: Intel Corporation
发明人: Liwei Ma , Nadathur Rajagopalan Satish , Jeremy Bottleson , Farshad Akhbari , Eriko Nurvitadhi , Abhishek R. Appu , Altug Koker , Kamal Sinha , Joydeep Ray , Balaji Vembu , Vasanth Ranganathan , Sanjeev Jahagirdar
摘要: A mechanism is described for facilitating fast data operations for machine learning at autonomous machines. A method of embodiments, as described herein, includes detecting input data to be used in computational tasks by a computation component of a compute pipeline of a processor including a graphics processor. The method may further include determining one or more frequently-used data values (FDVs) from the data, and pushing the one or more frequent data values to bypass the computational tasks.
-
公开(公告)号:US20170269672A9
公开(公告)日:2017-09-21
申请号:US14966708
申请日:2015-12-11
申请人: Intel Corporation
发明人: Sanjeev Jahagirdar , Varghese George , John B. Conrad , Robert Milstrey , Stephen A. Fischer , Alon Naveh , Shai Rotem
CPC分类号: G06F1/3287 , G06F1/3203 , G06F1/324 , G06F1/3243 , G06F1/3246 , G06F1/3275 , G06F1/3293 , G06F1/3296 , G06F9/4418 , G06F11/1441 , G06F12/084 , G06F12/0875 , G06F2212/281 , G06F2212/305 , G06F2212/314 , G11C7/1072 , Y02B70/123 , Y02B70/126 , Y02B70/32 , Y02D10/152 , Y02D10/172 , Y02D50/20 , Y02P80/11 , Y10T307/305 , Y10T307/406 , Y10T307/582 , Y10T307/826
摘要: Embodiments of the invention relate to a method and apparatus for a zero voltage processor sleep state. A processor may include a dedicated cache memory. A voltage regulator may be coupled to the processor to provide an operating voltage to the processor. During a transition to a zero voltage power management state for the processor, the operational voltage applied to the processor by the voltage regulator may be reduced to approximately zero and the state variables associated with the processor may be saved to the dedicated cache memory.
-
-
-
-
-
-
-
-
-