-
41.
公开(公告)号:US20190286563A1
公开(公告)日:2019-09-19
申请号:US15922809
申请日:2018-03-15
Applicant: Intel Corporation
Inventor: Bharath Narasimha Swamy , Joydeep Ray , Rama Kishan Malladi , James Valerio , Abhishek Appu
IPC: G06F12/0855
Abstract: Apparatus and method for improved cache utilization and efficiency on a many-core processor. An apparatus comprising: a plurality of execution units to generate cache access requests responsive to executing instructions; a pending request queue to store pending cache access requests generated by the execution units; pending queue management circuitry to compare a current cache access request with entries in the pending request queue to determine whether the current cache access request can be merged with an entry in the pending request queue and, if so, to merge the current cache access request with the entry.
-
公开(公告)号:US20250036412A1
公开(公告)日:2025-01-30
申请号:US18358308
申请日:2023-07-25
Applicant: Intel Corporation
Inventor: Supratim Pal , Jiasheng Chen , Christopher Spencer , Jorge E. Parra Osorio , Kevin Hurd , Guei-Yuan Lueh , Pradeep K. Golconda , Fangwen Fu , Wei Xiong , Hongzheng Li , James Valerio , Mukundan Swaminathan , Nicholas Murphy , Shuai Mu , Clifford Gibson , Buqi Cheng
Abstract: Described herein is a graphics processor comprising a memory interface and a graphics processing cluster coupled with the memory interface. The graphics processing cluster includes a plurality of processing resources. A processing resource of the plurality of processing resources includes a source crossbar communicatively coupled with a register file, the source crossbar to reorder data elements of a source operand and a format conversion pipeline to convert a plurality of input data elements specified by the source operand from a first format of a plurality of datatype formats to a second format of the plurality of datatype formats, the plurality of datatype formats including integer and floating-point formats.
-
公开(公告)号:US20250036361A1
公开(公告)日:2025-01-30
申请号:US18358304
申请日:2023-07-25
Applicant: Intel Corporation
Inventor: Supratim Pal , Jiasheng Chen , Kevin Hurd , Jorge E. Parra Osorio , Christopher Spencer , Guei-Yuan Lueh , Pradeep K. Golconda , Fangwen Fu , Wei Xiong , Hongzheng Li , James Valerio , Mukundan Swaminathan , Nicholas Murphy , Shuai Mu , Clifford Gibson , Buqi Cheng
IPC: G06F7/483
Abstract: Described herein is a graphics processor comprising a memory interface and a graphics processing cluster coupled with the memory interface. The graphics processing cluster includes a multi-lane parallel floating-point unit and a multi-lane parallel integer unit. The multi-lane parallel integer unit includes an integer pipeline including a plurality of parallel integer logic units configured to perform integer compute operations on a plurality of input data elements and a format conversion pipeline including a plurality of parallel format conversion units configured to convert a plurality of input data elements from a first one of a plurality of datatype formats to a second one of the plurality of datatype formats, the plurality of datatype formats including integer and floating-point formats.
-
公开(公告)号:US20250004981A1
公开(公告)日:2025-01-02
申请号:US18793247
申请日:2024-08-02
Applicant: Intel Corporation
Inventor: Abhishek R. Appu , Altug Koker , Aravindh Anantaraman , Elmoustapha Ould-Ahmed-Vall , Valentin Andrei , Nicolas Galoppo Von Borries , Varghese George , Mike Macpherson , Subramaniam Maiyuran , Joydeep Ray , Lakshminarayana Striramassarma , Scott Janus , Brent Insko , Vasanth Ranganathan , Kamal Sinha , Arthur Hunter , Prasoonkumar Surti , David Puffer , James Valerio , Ankur N. Shah
IPC: G06F15/78 , G06F7/544 , G06F7/575 , G06F7/58 , G06F9/30 , G06F9/38 , G06F9/50 , G06F12/02 , G06F12/06 , G06F12/0802 , G06F12/0804 , G06F12/0811 , G06F12/0862 , G06F12/0866 , G06F12/0871 , G06F12/0875 , G06F12/0882 , G06F12/0891 , G06F12/0893 , G06F12/0895 , G06F12/0897 , G06F12/1009 , G06F12/128 , G06F15/80 , G06F17/16 , G06F17/18 , G06N3/08 , G06T1/20 , G06T1/60 , G06T15/06 , H03M7/46
Abstract: Methods and apparatus relating to techniques for multi-tile memory management. In an example, a graphics processor includes an interposer, a first chiplet coupled with the interposer, the first chiplet including a graphics processing resource and an interconnect network coupled with the graphics processing resource, cache circuitry coupled with the graphics processing resource via the interconnect network, and a second chiplet coupled with the first chiplet via the interposer, the second chiplet including a memory-side cache and a memory controller coupled with the memory-side cache. The memory controller is configured to enable access to a high-bandwidth memory (HBM) device, the memory-side cache is configured to cache data associated with a memory access performed via the memory controller, and the cache circuitry is logically positioned between the graphics processing resource and a chiplet interface.
-
公开(公告)号:US12182062B1
公开(公告)日:2024-12-31
申请号:US17961833
申请日:2022-10-07
Applicant: Intel Corporation
Inventor: Abhishek R. Appu , Altug Koker , Aravindh Anantaraman , Elmoustapha Ould-Ahmed-Vall , Valentin Andrei , Nicolas Galoppo Von Borries , Varghese George , Mike Macpherson , Subramaniam Maiyuran , Joydeep Ray , Lakshminarayanan Striramassarma , Scott Janus , Brent Insko , Vasanth Ranganathan , Kamal Sinha , Arthur Hunter , Prasoonkumar Surti , David Puffer , James Valerio , Ankur N. Shah
IPC: G06F15/78 , G06F7/544 , G06F7/575 , G06F7/58 , G06F9/30 , G06F9/38 , G06F9/50 , G06F12/02 , G06F12/06 , G06F12/0802 , G06F12/0804 , G06F12/0811 , G06F12/0862 , G06F12/0866 , G06F12/0871 , G06F12/0875 , G06F12/0882 , G06F12/0891 , G06F12/0893 , G06F12/0895 , G06F12/0897 , G06F12/1009 , G06F12/128 , G06F15/80 , G06F17/16 , G06F17/18 , G06T1/20 , G06T1/60 , H03M7/46 , G06N3/08 , G06T15/06
Abstract: Methods and apparatus relating to techniques for multi-tile memory management. In an example, a graphics processor includes an interposer, a first chiplet coupled with the interposer, the first chiplet including a graphics processing resource and an interconnect network coupled with the graphics processing resource, cache circuitry coupled with the graphics processing resource via the interconnect network, and a second chiplet coupled with the first chiplet via the interposer, the second chiplet including a memory-side cache and a memory controller coupled with the memory-side cache. The memory controller is configured to enable access to a high-bandwidth memory (HBM) device, the memory-side cache is configured to cache data associated with a memory access performed via the memory controller, and the cache circuitry is logically positioned between the graphics processing resource and a chiplet interface.
-
公开(公告)号:US20240330001A1
公开(公告)日:2024-10-03
申请号:US18620217
申请日:2024-03-28
Applicant: Intel Corporation
Inventor: John Wiegert , Joydeep Ray , Timothy Bauer , James Valerio
CPC classification number: G06F9/3887 , G06F9/355 , G06F15/7839 , G06F9/30036 , G06F9/30043
Abstract: Embodiments described herein provide a technique to decompose 64-bit per-lane virtual addresses to access a plurality of data elements on behalf of a multi-lane parallel processing execution resource of a graphics or compute accelerator. The 64-bit per-lane addresses are decomposed into a base address and a plurality of per-lane offsets for transmission to memory access circuitry. The memory access circuitry then combines the base address and the per-lane offsets to reconstruct the per-lane addresses.
-
公开(公告)号:US20240086064A1
公开(公告)日:2024-03-14
申请号:US17944500
申请日:2022-09-14
Applicant: Intel Corporation
Inventor: John Wiegert , Joydeep Ray , Timothy Bauer , James Valerio
CPC classification number: G06F3/0604 , G06F3/0644 , G06F3/0673 , G06T1/20 , G06T1/60
Abstract: Embodiments described herein enable the offload of address calculations required to access a data element within an array of data elements from primary compute resources of a graphics processor to the memory access circuitry of the graphics processor. The memory access circuitry is configured to receive a message to access a data element of an array of data elements in the memory, the message to include an index of the data element in the array of data elements, calculate a byte address for the data element based in part on the index of the data element in the array of data elements, and submit a memory access request to the memory to access the data element at the byte address.
-
公开(公告)号:US20220180467A1
公开(公告)日:2022-06-09
申请号:US17428534
申请日:2020-03-14
Applicant: Intel Corporation
Inventor: Altug Koker , Joydeep Ray , Aravindh Anantaraman , Valentin Andrei , Abhishek Appu , Sean Coleman , Nicolas Galoppo Von Borries , Varghese George , Pattabhiraman K , SungYe Kim , Mike Macpherson , Subramaniam Maiyuran , Elmoustapha Ould-Ahmed-Vall , Vasanth Ranganathan , James Valerio
IPC: G06T1/20 , G06F12/0804 , G06F12/0811 , G06T1/60
Abstract: Systems and methods for updating remote memory side caches in a multi-GPU configuration are disclosed herein. In one embodiment, a graphics processor for a multi-tile architecture includes a first graphics processing unit (GPU) having a first memory, a first memory side cache memory, a first communication fabric, and a first memory management unit (MMU). The graphics processor includes a second graphics processing unit (GPU) having a second memory, a second memory side cache memory, a second memory management unit (MMU), and a second communication fabric that is communicatively coupled to the first communication fabric. The first MMU is configured to control memory requests for the first memory, to update content in the first memory, to update content in the first memory side cache memory, and to determine whether to update the content in the second memory side cache memory.
-
公开(公告)号:US11321262B2
公开(公告)日:2022-05-03
申请号:US17014023
申请日:2020-09-08
Applicant: Intel Corporation
Inventor: Hema Chand Nalluri , Ankur Shah , Joydeep Ray , Aditya Navale , Altug Koker , Murali Ramadoss , Niranjan L. Cooray , Jeffery S. Boles , Aravindh Anantaraman , David Puffer , James Valerio , Vasanth Ranganathan
IPC: G06F9/52 , G06F12/14 , G06F13/40 , G06F13/16 , G06F12/0888 , G06F12/0837 , G06F9/30
Abstract: An apparatus to facilitate memory barriers is disclosed. The apparatus comprises an interconnect, a device memory, a plurality of processing resources, coupled to the device memory, to execute a plurality of execution threads as memory data producers and memory data consumers to a device memory and a system memory and fence hardware to generate fence operations to enforce data ordering on memory operations issued to the device memory and a system memory coupled via the interconnect.
-
50.
公开(公告)号:US11194722B2
公开(公告)日:2021-12-07
申请号:US15922809
申请日:2018-03-15
Applicant: Intel Corporation
Inventor: Bharath Narasimha Swamy , Joydeep Ray , Rama Kishan Malladi , James Valerio , Abhishek Appu
IPC: G06F12/0842 , G06F12/0855
Abstract: Apparatus and method for improved cache utilization and efficiency on a many-core processor. An apparatus comprising: a plurality of execution units to generate cache access requests responsive to executing instructions; a pending request queue to store pending cache access requests generated by the execution units; pending queue management circuitry to compare a current cache access request with entries in the pending request queue to determine whether the current cache access request can be merged with an entry in the pending request queue and, if so, to merge the current cache access request with the entry.
-
-
-
-
-
-
-
-
-