-
公开(公告)号:US10521349B2
公开(公告)日:2019-12-31
申请号:US16277267
申请日:2019-02-15
Applicant: Intel Corporation
Inventor: Chandrasekaran Sakthivel , Prasoonkumar Surti , John C. Weast , Sara S. Baghsorkhi , Justin E. Gottschlich , Abhishek R. Appu , Nicolas C. Galoppo Von Borries , Joydeep Ray , Narayan Srinivasa , Feng Chen , Ben J. Ashbaugh , Rajkishore Barik , Tsung-Han Lin , Kamal Sinha , Eriko Nurvitadhi , Balaji Vembu , Altug Koker
IPC: G06F12/0837 , G06N3/08 , G06N20/00 , G06T1/20 , G06F12/0815 , G06N3/04 , G06N3/063
Abstract: In an example, an apparatus comprises a plurality of processing unit cores, a plurality of cache memory modules associated with the plurality of processing unit cores, and a machine learning model communicatively coupled to the plurality of processing unit cores, wherein the plurality of cache memory modules share cache coherency data with the machine learning model. Other embodiments are also disclosed and claimed.
-
公开(公告)号:US10482562B2
公开(公告)日:2019-11-19
申请号:US15493522
申请日:2017-04-21
Applicant: Intel Corporation
Inventor: Abhishek R. Appu , Balaji Vembu , Altug Koker , Bryan R. White , David J. Cowperthwaite , Joydeep Ray , Murali Ramadoss
Abstract: An apparatus to facilitate partitioning of a graphics device is disclosed. The apparatus includes a plurality of engines and logic to partition the plurality of engines to facilitate independent access to each engine within the plurality of engines.
-
公开(公告)号:US20190332903A1
公开(公告)日:2019-10-31
申请号:US16505012
申请日:2019-07-08
Applicant: Intel Corporation
Inventor: Kevin Nealis , Anbang Yao , Xiaoming Chen , Elmoustapha Ould-Ahmed-Vall , Sara S. Baghsorkhi , Eriko Nurvitadhi , Balaji Vembu , Nicolas C. Galoppo Von Borries , Rajkishore Barik , Tsung-Han Lin , Kamal Sinha
Abstract: One embodiment provides for a compute apparatus comprising a decode unit to decode a single instruction into a decoded instruction that specifies multiple operands including a multi-bit input value and a bipolar binary weight associated with a neural network and an arithmetic logic unit including a multiplier, an adder, and an accumulator register. To execute the decoded instruction, the multiplier is to perform a multiplication operation on the multi-bit input based on the bipolar binary weight to generate an intermediate product and the adder is to add the intermediate product to a value stored in the accumulator register and update the value stored in the accumulator register.
-
244.
公开(公告)号:US10452552B2
公开(公告)日:2019-10-22
申请号:US15488988
申请日:2017-04-17
Applicant: Intel Corporation
Inventor: Andrew T. Lauritzen , Gabor Liktor , Tomer Bar-On , Hugues Labbe , John G. Gierach , Joydeep Ray , Travis T. Schluessler , John H. Feit , Nikos Kaburlasos , Jacek Kwiatkowski , Abhishek R. Appu , Balaji Vembu , Altug Koker
IPC: G06F12/0862 , G06F9/30 , G06F12/0875 , G06F12/0811 , G06F12/0855 , G06F9/38 , G06T1/20
Abstract: Systems, apparatuses and methods may provide a way to track graphics pipeline operations. More particularly, the systems, apparatuses and methods may provide a way to track operation dependencies between graphics pipeline operations for blocks of pixel samples and stall one or more of the pipeline operations based on the operation dependencies. The systems, apparatuses and methods may further provide cache pre-fetch hardware to monitor processing of blocks of pixel samples and fetch a next block of the pixel samples from the memory into a cache before completion of processing a current block of pixel samples based on one or more of the pipeline operations or a surface state of one or more regions of a screen space.
-
公开(公告)号:US10452397B2
公开(公告)日:2019-10-22
申请号:US15477022
申请日:2017-04-01
Applicant: Intel Corporation
Inventor: Joydeep Ray , Altug Koker , Balaji Vembu , Abhishek R. Appu , Kamal Sinha , Prasoonkumar Surti , Kiran C. Veernapu
Abstract: Methods and apparatus relating to techniques for avoiding cache lookup for cold cache. In an example, an apparatus comprises logic, at least partially comprising hardware logic, to determine a first number of threads to be scheduled for each context of a plurality of contexts in a multi-context processing system, allocate a second number of streaming multiprocessors (SMs) to the respective plurality of contexts, and dispatch threads from the plurality of contexts only to the streaming multiprocessor(s) allocated to the respective plurality of contexts. Other embodiments are also disclosed and claimed.
-
公开(公告)号:US20190296909A1
公开(公告)日:2019-09-26
申请号:US16435083
申请日:2019-06-07
Applicant: Intel Corporation
Inventor: Balaji Vembu , Vidhya Krishnan , Sandeep S. Sodhi , Scott Janus , Daniel Nemiroff
Abstract: An embodiment of a graphics apparatus may include a graphics processor including a kernel executor, and a security engine communicatively coupled to the graphics processor. The security engine may be configured to create a kernel security key, encrypt an executable kernel for the kernel executor in accordance with the kernel security key, and share the kernel security key with the graphics processor.
-
公开(公告)号:US10410098B2
公开(公告)日:2019-09-10
申请号:US15494710
申请日:2017-04-24
Applicant: Intel Corporation
Inventor: Kevin Nealis , Anbang Yao , Xiaoming Chen , Elmoustapha Ould-Ahmed-Vall , Sara S. Baghsorkhi , Eriko Nurvitadhi , Balaji Vembu , Nicolas C. Galoppo Von Borries , Rajkishore Barik , Tsung-Han Lin , Kamal Sinha
IPC: G06F9/30 , G06K9/66 , G06K9/00 , G06F9/38 , G06F9/46 , G06T1/20 , G06N3/04 , G06N3/063 , G06N3/08
Abstract: One embodiment provides for a compute apparatus to perform machine learning operations, the apparatus comprising a decode unit to decode a single instruction into a decoded instruction that specifies multiple operands including an input value and a quantized weight value associated with a neural network and an arithmetic logic unit including a barrel shifter, an adder, and an accumulator register, wherein to execute the decoded instruction, the barrel shifter is to shift the input value by the quantized weight value to generate a shifted input value and the adder is to add the shifted input value to a value stored in the accumulator register and update the value stored in the accumulator register.
-
公开(公告)号:US20190251033A1
公开(公告)日:2019-08-15
申请号:US16277114
申请日:2019-02-15
Applicant: Intel Corporation
Inventor: Abhishek R. Appu , Altug Koker , Joydeep Ray , Prasoonkumar Surti , Kamal Sinha , Kiran C. Veernapu , Balaji Vembu
IPC: G06F12/0888 , G06F13/40 , G06T1/60 , G06T1/20 , G06F13/42
CPC classification number: G06F12/0888 , G06F12/0895 , G06F13/4022 , G06F13/4282 , G06F2212/1024 , G06F2212/1028 , G06F2212/6032 , G06F2213/0026 , G06T1/20 , G06T1/60
Abstract: Methods and apparatus relating to techniques for avoiding cache lookup for cold cache. In an example, an apparatus comprises logic, at least partially comprising hardware logic, to receive, in a read/modify/write (RMW) pipeline, a cache access request from a requestor, wherein the cache request comprises a cache set identifier associated with requested data in the cache set, determine whether the cache set associated with the cache set identifier is in an inaccessible invalid state, and in response to a determination that the cache set is in an inaccessible state or an invalid state, to terminate the cache access request. Other embodiments are also disclosed and claimed.
-
公开(公告)号:US10380713B2
公开(公告)日:2019-08-13
申请号:US16150012
申请日:2018-10-02
Applicant: Intel Corporation
Inventor: Balaji Vembu , Altug Koker , Joydeep Ray
Abstract: One embodiment provides for a general-purpose graphics processing unit comprising multiple processing elements having a single instruction, multiple thread architecture, the multiple processing elements enabled to perform hardware multithreading, wherein execution context for threads to be executed is maintained on-chip during execution, a scheduler to schedule a warp to the multiple processing elements, wherein the warp is a group of parallel threads, the warp includes multiple sub-warps, and threads within the warp diverge at sub-warp granularity, and a logic unit including hardware or firmware logic, the logic unit to group active threads from the warp for execution on the multiple processing elements.
-
公开(公告)号:US10332320B2
公开(公告)日:2019-06-25
申请号:US15488914
申请日:2017-04-17
Applicant: Intel Corporation
Inventor: Barath Lakshamanan , Linda L. Hurd , Ben J. Ashbaugh , Elmoustapha Ould-Ahmed-Vall , Liwei Ma , Jingyi Jin , Justin E. Gottschlich , Chandrasekaran Sakthivel , Michael S. Strickland , Brian T. Lewis , Lindsey Kuper , Altug Koker , Abhishek R. Appu , Prasoonkumar Surti , Joydeep Ray , Balaji Vembu , Javier S. Turek , Naila Farooqui
IPC: G01C22/00 , G07C5/00 , G05D1/00 , G01C21/34 , G08G1/01 , H04W28/08 , G06N20/00 , G06F9/50 , G08G1/052 , G01S19/13 , G05D1/02 , H04L29/08 , H04L12/26
Abstract: One embodiment provides for a computing device within an autonomous vehicle, the compute device comprising a wireless network device to enable a wireless data connection with an autonomous vehicle network, a set of multiple processors including a general-purpose processor and a general-purpose graphics processor, the set of multiple processors to execute a compute manager to manage execution of compute workloads associated with the autonomous vehicle, the compute workload associated with autonomous operations of the autonomous vehicle, and offload logic configured to execute on the set of multiple processors, the offload logic to determine to offload one or more of the compute workloads to one or more autonomous vehicles within range of the wireless network device.
-
-
-
-
-
-
-
-
-