-
公开(公告)号:US20220413869A1
公开(公告)日:2022-12-29
申请号:US17900230
申请日:2022-08-31
Applicant: Intel Corporation
Inventor: Balaji Vembu , Abhishek R. Appu , Joydeep Ray , Altug Koker
IPC: G06F9/38 , G06F9/48 , G06F12/0866 , G06F9/46 , G06F9/54 , G06F15/76 , G06F12/0897 , G06F9/52 , G06F15/16 , G06T1/20 , G06F9/50 , G06T1/60
Abstract: An apparatus to facilitate thread scheduling is disclosed. In one embodiment the apparatus includes a processor comprising a plurality of multiprocessors comprising single-instruction multiple thread (SIMT) execution circuitry to simultaneously execute multiple threads, a shared local memory to be shared by the multiple threads, and scheduling hardware logic to schedule the multiple threads in a thread group for execution across the plurality of multiprocessors in accordance with barrier data. The instructions of the multiple threads are to produce shared data to be stored in the shared local memory when executed by the plurality of multiprocessors, wherein additional instructions of at least a first thread of the multiple threads are to use the shared data, and wherein, in accordance with the barrier data, the first thread is to wait for other threads of the multiple threads to finish producing the shared data before executing the additional instructions.
-
公开(公告)号:US20220405876A1
公开(公告)日:2022-12-22
申请号:US17738254
申请日:2022-05-06
Applicant: Intel Corporation
Inventor: Abhishek R. Appu , Balaji Vembu , Altug Koker , Bryan R. White , David J. Cowperthwaite , Joydeep Ray , Murali Ramadoss
Abstract: An apparatus to facilitate partitioning of a graphics device is disclosed. The apparatus includes a plurality of engines and logic to partition the plurality of engines to facilitate independent access to each engine within the plurality of engines.
-
公开(公告)号:US11531510B2
公开(公告)日:2022-12-20
申请号:US17399103
申请日:2021-08-11
Applicant: Intel Corporation
Inventor: Eric J. Asperheim , Subramaniam M. Maiyuran , Kiran C. Veernapu , Sanjeev S. Jahagirdar , Balaji Vembu , Devan Burke , Philip R. Laws , Kamal Sinha , Abhishek R. Appu , Elmoustapha Ould-Ahmed-Vall , Peter L. Doyle , Joydeep Ray , Travis T. Schluessler , John H. Feit , Nikos Kaburlasos , Jacek Kwiatkowski , Altug Koker
IPC: G06F3/14 , G06F3/01 , G09G5/391 , G06F3/0484 , G09G5/00
Abstract: In accordance with some embodiments, the render rate is varied across and/or up and down the display screen. This may be done based on where the user is looking in order to reduce power consumption and/or increase performance. Specifically the screen display is separated into regions, such as quadrants. Each of these regions is rendered at a rate determined by at least one of what the user is currently looking at, what the user has looked at in the past and/or what it is predicted that the user will look at next. Areas of less focus may be rendered at a lower rate, reducing power consumption in some embodiments.
-
公开(公告)号:US20220357742A1
公开(公告)日:2022-11-10
申请号:US17750917
申请日:2022-05-23
Applicant: Intel Corporation
Inventor: Abhishek R. Appu , Altug Koker , Joydeep Ray , Balaji Vembu , John C. Weast , Mike B. Macpherson , Dukhwan Kim , Linda L. Hurd , Sanjeev Jahagirdar , Vasanth Ranganathan
Abstract: A mechanism is described for facilitating barriers and synchronization for machine learning at autonomous machines. A method of embodiments, as described herein, includes detecting thread groups relating to machine learning associated with one or more processing devices. The method may further include facilitating barrier synchronization of the thread groups across multiple dies such that each thread in a thread group is scheduled across a set of compute elements associated with the multiple dies, where each die represents a processing device of the one or more processing devices, the processing device including a graphics processor.
-
公开(公告)号:US20220335562A1
公开(公告)日:2022-10-20
申请号:US17741934
申请日:2022-05-11
Applicant: Intel Corporation
Inventor: Prasoonkumar Surti , Narayan Srinivasa , Feng Chen , Joydeep Ray , Ben J. Ashbaugh , Nicolas C. Galoppo Von Borries , Eriko Nurvitadhi , Balaji Vembu , Tsung-Han Lin , Kamal Sinha , Rajkishore Barik , Sara S. Baghsorkhi , Justin E. Gottschlich , Altug Koker , Nadathur Rajagopalan Satish , Farshad Akhbari , Dukhwan Kim , Wenyin Fu , Travis T. Schluessler , Josh B. Mastronarde , Linda L. Hurd , John H. Feit , Jeffery S. Boles , Adam T. Lake , Karthik Vaidyanathan , Devan Burke , Subramaniam Maiyuran , Abhishek R. Appu
Abstract: Embodiments provide mechanisms to facilitate compute operations for deep neural networks. One embodiment comprises a graphics processing unit comprising one or more multiprocessors, at least one of the one or more multiprocessors including a register file to store a plurality of different types of operands and a plurality of processing cores. The plurality of processing cores includes a first set of processing cores of a first type and a second set of processing cores of a second type. The first set of processing cores are associated with a first memory channel and the second set of processing cores are associated with a second memory channel.
-
公开(公告)号:US11475623B2
公开(公告)日:2022-10-18
申请号:US17141431
申请日:2021-01-05
Applicant: Intel Corporation
Inventor: Joydeep Ray , Abhishek R. Appu , Pattabhiraman K , Balaji Vembu , Altug Koker , Niranjan L. Cooray , Josh B. Mastronarde
IPC: G06T15/00 , G06F9/455 , G06T1/60 , G09G5/36 , G09G5/00 , G09G5/393 , G06F9/48 , G06F9/50 , G06T15/04 , G06T15/80 , G06T17/10 , G06T17/20
Abstract: An apparatus and method are described for allocating local memories to virtual machines. For example, one embodiment of an apparatus comprises: a command streamer to queue commands from a plurality of virtual machines (VMs) or applications, the commands to be distributed from the command streamer and executed by graphics processing resources of a graphics processing unit (GPU); a tile cache to store graphics data associated with the plurality of VMs or applications as the commands are executed by the graphics processing resources; and tile cache allocation hardware logic to allocate a first portion of the tile cache to a first VM or application and a second portion of the tile cache to a second VM or application; the tile cache allocation hardware logic to further allocate a first region in system memory to store spill-over data when the first portion of the tile cache and/or the second portion of the file cache becomes full.
-
公开(公告)号:US11475286B2
公开(公告)日:2022-10-18
申请号:US17558285
申请日:2021-12-21
Applicant: Intel Corporation
Inventor: Rajkishore Barik , Elmoustapha Ould-Ahmed-Vall , Xiaoming Chen , Dhawal Srivastava , Anbang Yao , Kevin Nealis , Eriko Nurvitadhi , Sara S. Baghsorkhi , Balaji Vembu , Tatiana Shpeisman , Ping T. Tang
Abstract: One embodiment provides an apparatus comprising an instruction cache to store a plurality of instructions, a scheduler unit coupled to the instruction cache, the scheduler unit to schedule the plurality of instructions for execution, an instruction fetch and decode unit to decode the plurality of instructions to determine a set of operations to perform in response, one or more compute blocks to perform parallel multiply-accumulate operations based on the instruction fetch and decode unit decoding a first instruction of the plurality of instructions, and matrix multiplication logic to perform matrix multiplication operations based on the instruction fetch and decode unit decoding a second instruction of the plurality of instructions.
-
公开(公告)号:US11430083B2
公开(公告)日:2022-08-30
申请号:US17193658
申请日:2021-03-05
Applicant: Intel Corporation
Inventor: Eriko Nurvitadhi , Balaji Vembu , Tsung-Han Lin , Kamal Sinha , Rajkishore Barik , Nicolas C. Galoppo Von Borries
IPC: G06F17/16 , G06T1/20 , G06F9/30 , G06F9/38 , G06F12/0811 , G06F12/0815 , G06F12/0831 , G06F12/0888 , H03M7/30 , G06K9/62 , G06N20/00 , G06F12/02 , G06F9/48 , G06N3/04 , G06N3/08 , G06T1/60 , G06T15/00
Abstract: Techniques to improve performance of matrix multiply operations are described in which a compute kernel can specify one or more element-wise operations to perform on output of the compute kernel before the output is transferred to higher levels of a processor memory hierarchy.
-
公开(公告)号:US11360808B2
公开(公告)日:2022-06-14
申请号:US15482801
申请日:2017-04-09
Applicant: Intel Corporation
Inventor: Joydeep Ray , Abhishek R. Appu , Altug Koker , Kamal Sinha , Balaji Vembu , Rajkishore Barik , Eriko Nurvitadhi , Nicolas Galoppo Von Borries , Tsung-Han Lin , Sanjeev Jahagirdar , Vasanth Ranganathan
Abstract: A mechanism is described for facilitating intelligent thread scheduling at autonomous machines. A method of embodiments, as described herein, includes detecting dependency information relating to a plurality of threads corresponding to a plurality of workloads associated with tasks relating to a processor including a graphics processor. The method may further include generating a tree of thread groups based on the dependency information, where each thread group includes multiple threads, and scheduling one or more of the thread groups associated a similar dependency to avoid dependency conflicts.
-
公开(公告)号:US11354770B2
公开(公告)日:2022-06-07
申请号:US17182256
申请日:2021-02-23
Applicant: INTEL CORPORATION
Inventor: Abhishek R. Appu , Joydeep Ray , Altug Koker , Balaji Vembu , Pattabhiraman K , Matthew B. Callaway
Abstract: An apparatus and method for dynamic provisioning, quality of service, and prioritization in a graphics processor. For example, one embodiment of an apparatus comprises a graphics processing unit (GPU) comprising a plurality of graphics processing resources; slice configuration hardware logic to logically subdivide the graphics processing resources into a plurality of slices; and slice allocation hardware logic to allocate a designated number of slices to each virtual machine (VM) of a plurality of VMs running in a virtualized execution environment, the slice allocation hardware logic to allocate different numbers of slices to different VMs based on graphics processing requirements and/or priorities of each of the VMs.
-
-
-
-
-
-
-
-
-