专利检索 ap:("Intel Corporation") AND inv:"Guei-Yuan Lueh" 第 1 页

1.

发明公开
CROSS-THREAD REGISTER SHARING FOR MATRIX MULTIPLICATION COMPUTE 审中-公开

公开(公告)号：US20240168807A1

公开(公告)日：2024-05-23

申请号：US18056949

申请日：2022-11-18

申请人： Intel Corporation

发明人： Jorge Eduardo Parra Osorio , Guei-Yuan Lueh , Maxim Kazakov , Fangwen Fu , Supratim Pal , Kaiyu Chen

IPC分类号： G06F9/50 , G06F9/48 , G06F9/52 , G06F15/80

CPC分类号： G06F9/5027 , G06F9/48 , G06F9/522 , G06F15/8046

摘要： An apparatus to facilitate cross-thread register sharing for matrix multiplication compute is disclosed. The apparatus includes matrix acceleration hardware comprising a plurality of data processing units, wherein the respective plurality of data processing units are to: receive a decoded instruction for a first thread having a first register space, wherein the decoded instruction is for a matrix multiplication operation and comprises an indication to utilize a second register space of a second thread for an operand of the decoded instruction for the first thread; access the second register space of the second thread to obtain data for the operand of the decoded instruction; and perform the matrix multiplication operation for the first thread using the data for the operand from the second register space of the second thread.

2.

发明公开
VIRTUAL ADDRESS ACCESS TO GPU SURFACE AND SAMPLER STATES 审中-公开

公开(公告)号：US20240134527A1

公开(公告)日：2024-04-25

申请号：US17971290

申请日：2022-10-20

申请人： Intel Corporation

发明人： Joydeep Ray , Michael Apodaca , Yoav Harel , Guei-Yuan Lueh , John A. Wiegert

IPC分类号： G06F3/06 , G06T1/60

CPC分类号： G06F3/061 , G06F3/0655 , G06F3/0679 , G06T1/60

摘要： Embodiments described herein provide a technique to enable access to entries in a surface state or sampler state using 64-bit virtual addresses. One embodiment provides a graphics core that includes memory access circuitry configured to facilitate access to the memory by functional units of the graphics core. The memory access circuitry is configured to receive a message to access an entry in a surface state or a sampler state associated with a parallel processing operation. The message specifies a base address for a surface state entry or sampler state entry. The circuitry can add the base address and the offset to determine a 64-bit virtual address for the entry in the surface state entry or the sampler state and submit a memory access request to the memory to access the entry of the surface state or sampler state.

3.

发明授权
Hybrid low power homogenous grapics processing units 有权

公开(公告)号：US11762696B2

公开(公告)日：2023-09-19

申请号：US17520583

申请日：2021-11-05

申请人： Intel Corporation

发明人： Abhishek R Appu , Altug Koker , Balaji Vembu , Joydeep Ray , Kamal Sinha , Prasoonkumar Surti , Kiran C. Veernapu , Subramaniam Maiyuran , Sanjeev S. Jahagirdar , Eric J. Asperheim , Guei-Yuan Lueh , David Puffer , Wenyin Fu , Nikos Kaburlasos , Bhushan M. Borole , Josh B. Mastronarde , Linda L. Hurd , Travis T. Schluessler , Tomasz Janczak , Abhishek Venkatesh , Kai Xiao , Slawomir Grajewski

IPC分类号： G06F9/50 , G06F1/329 , G06F9/48 , G06T1/20 , G06T1/60 , G06T15/00

CPC分类号： G06F9/5016 , G06F1/329 , G06F9/4893 , G06F9/5044 , G06T1/20 , G06T1/60 , G06T15/005 , G06T2200/28 , Y02D10/00

摘要： In an example, an apparatus comprises a plurality of execution units comprising at least a first type of execution unit and a second type of execution unit and logic, at least partially including hardware logic, to analyze a workload and assign the workload to one of the first type of execution unit or the second type of execution unit. Other embodiments are also disclosed and claimed.

4.

发明授权
Engine to enable high speed context switching via on-die storage 有权

公开(公告)号：US11748302B2

公开(公告)日：2023-09-05

申请号：US17561427

申请日：2021-12-23

申请人： Intel Corporation

发明人： Altug Koker , Prasoonkumar Surti , David Puffer , Subramaniam Maiyuran , Guei-Yuan Lueh , Abhishek R. Appu , Joydeep Ray , Balaji Vembu , Tomer Bar-On , Andrew T. Lauritzen , Hugues Labbe , John G. Gierach , Gabor Liktor

IPC分类号： G06F16/13 , G06F9/38 , G06F9/30 , G06F16/11 , G06F16/172 , G06F9/46 , G06F12/1036 , G06F12/1045 , G06F12/0831

CPC分类号： G06F16/13 , G06F9/30 , G06F9/38 , G06F9/3836 , G06F9/461 , G06F16/113 , G06F16/172 , G06F12/0831 , G06F12/1036 , G06F12/1045 , G06F2201/84

摘要： In an example, an apparatus comprises a plurality of execution units, and a first memory communicatively couple to the plurality of execution units, wherein the first shared memory is shared by the plurality of execution units and a copy engine to copy context state data from at least a first of the plurality of execution units to the first shared memory. Other embodiments are also disclosed and claimed.

5.

发明公开
GRAPHICS PROCESSORS AND GRAPHICS PROCESSING UNITS HAVING DOT PRODUCT ACCUMULATE INSTRUCTION FOR HYBRID FLOATING POINT FORMAT 审中-公开

公开(公告)号：US20230195685A1

公开(公告)日：2023-06-22

申请号：US18170900

申请日：2023-02-17

申请人： Intel Corporation

发明人： Subramaniam Maiyuran , Shubra Marwaha , Ashutosh Garg , Supratim Pal , Jorge Parra , Chandra Gurram , Varghese George , Darin Starkey , Guei-Yuan Lueh

IPC分类号： G06F15/78 , G06F9/30 , G06F12/128 , G06F17/16 , G06F12/0811 , G06F12/02 , G06F12/0866 , G06F7/544 , G06F9/50 , G06F17/18 , G06F9/38 , G06F12/0891 , G06F12/06 , G06F12/0888 , G06F12/0802 , G06T1/60 , G06F12/0871 , G06T1/20 , H03M7/46 , G06F12/0875 , G06F12/0862 , G06F15/80 , G06F12/0897 , G06F12/0893 , G06F12/0804 , G06F12/0882 , G06F7/575 , G06F12/1009 , G06F12/0895 , G06F7/58 , G06T15/06 , G06N3/08

CPC分类号： G06F15/7839 , G06F9/30043 , G06F12/128 , G06F17/16 , G06F12/0811 , G06F12/0238 , G06F12/0866 , G06F9/30014 , G06F7/5443 , G06F9/5077 , G06F12/0246 , G06F17/18 , G06F9/3887 , G06F12/0891 , G06F12/0607 , G06F12/0888 , G06F12/0802 , G06T1/60 , G06F9/30079 , G06F12/0871 , G06F9/30036 , G06T1/20 , H03M7/46 , G06F12/0215 , G06F12/0875 , G06F12/0862 , G06F15/8046 , G06F9/30047 , G06F9/30065 , G06F12/0897 , G06F9/5011 , G06F12/0893 , G06F12/0804 , G06F12/0882 , G06F9/3001 , G06F7/575 , G06F12/1009 , G06F9/3004 , G06F12/0895 , G06F7/588 , G06F2212/401 , G06F2212/1044 , G06F9/3867 , G06F9/3818 , G06F9/3802 , G06F2212/455 , G06F2212/1021 , G06F2212/60 , G06F2212/1008 , G06T15/06 , G06N3/08 , G06F2212/302

摘要： Described herein is a graphics processing unit (GPU) configured to receive an instruction having multiple operands, where the instruction is a single instruction multiple data (SIMD) instruction configured to use a bfloat16 (BF16) number format and the BF16 number format is a sixteen-bit floating point format having an eight-bit exponent. The GPU can process the instruction using the multiple operands, where to process the instruction includes to perform a multiply operation, perform an addition to a result of the multiply operation, and apply a rectified linear unit function to a result of the addition.

6.

发明授权
Instruction and logic for systolic dot product with accumulate 有权

公开(公告)号：US11640297B2

公开(公告)日：2023-05-02

申请号：US17304153

申请日：2021-06-15

申请人： Intel Corporation

发明人： Subramaniam Maiyuran , Guei-Yuan Lueh , Supratim Pal , Ashutosh Garg , Chandra S. Gurram , Jorge E. Parra , Junjie Gu , Konrad Trifunovic , Hong Bin Liao , Mike B. MacPherson , Shubh B. Shah , Shubra Marwaha , Stephen Junkins , Timothy R. Bauer , Varghese George , Weiyu Chen

IPC分类号： G06F9/30 , G06T1/20 , G06F9/38

摘要： Embodiments described herein provided for an instruction and associated logic to enable GPGPU program code to access special purpose hardware logic to accelerate dot product operations. One embodiment provides for a graphics processing unit comprising a fetch unit to fetch an instruction for execution and a decode unit to decode the instruction into a decoded instruction. The decoded instruction is a matrix instruction to cause the graphics processing unit to perform a parallel dot product operation. The GPGPU also includes systolic dot product circuitry to execute the decoded instruction across one or more SIMD lanes using multiple systolic layers, wherein to execute the decoded instruction, a dot product computed at a first systolic layer is to be output to a second systolic layer, wherein each systolic layer includes one or more sets of interconnected multipliers and adders, each set of multipliers and adders to generate a dot product.

7.

发明授权
Instructions and logic for vector multiply add with zero skipping 有权

公开(公告)号：US11314515B2

公开(公告)日：2022-04-26

申请号：US16724831

申请日：2019-12-23

申请人： Intel Corporation

发明人： Supratim Pal , Sasikanth Avancha , Ishwar Bhati , Wei-Yu Chen , Dipankar Das , Ashutosh Garg , Chandra S. Gurram , Junjie Gu , Guei-Yuan Lueh , Subramaniam Maiyuran , Jorge E. Parra , Sudarshan Srinivasan , Varghese George

IPC分类号： G06F9/38 , G06F9/30

摘要： Embodiments described herein provide for an instruction and associated logic to enable a vector multiply add instructions with automatic zero skipping for sparse input. One embodiment provides for a general-purpose graphics processor comprising logic to perform operations comprising fetching a hardware macro instruction having a predicate mask, a repeat count, and a set of initial operands, where the initial operands include a destination operand and multiple source operands. The hardware macro instruction is configured to perform one or more multiply/add operations on input data associated with a set of matrices.

8.

发明授权
Instruction and logic for systolic dot product with accumulate 有权

公开(公告)号：US11042370B2

公开(公告)日：2021-06-22

申请号：US15957728

申请日：2018-04-19

申请人： Intel Corporation

发明人： Subramaniam Maiyuran , Guei-Yuan Lueh , Supratim Pal , Ashutosh Garg , Chandra S. Gurram , Jorge E. Parra , Junjie Gu , Konrad Trifunovic , Hong Bin Liao , Mike B. Macpherson , Shubh B. Shah , Shubra Marwaha , Stephen Junkins , Timothy R. Bauer , Varghese George , Weiyu Chen

IPC分类号： G06F9/30 , G06T1/20 , G06F9/38

摘要： Embodiments described herein provided for an instruction and associated logic to enable GPGPU program code to access special purpose hardware logic to accelerate dot product operations. One embodiment provides for a graphics processing unit comprising a fetch unit to fetch an instruction for execution and a decode unit to decode the instruction into a decoded instruction. The decoded instruction is a matrix instruction to cause the graphics processing unit to perform a parallel dot product operation. The GPGPU also includes a systolic dot product unit to execute the decoded instruction across one or more SIMD lanes using multiple systolic layers, wherein to execute the decoded instruction, a dot product computed at a first systolic layer is to be output to a second systolic layer, wherein each systolic layer includes one or more sets of interconnected multipliers and adders, each set of multipliers and adders to generate a dot product.

9.

发明申请
GRAPHICS PROCESSING UNIT PROCESSING AND CACHING IMPROVEMENTS 有权

公开(公告)号：US20210150663A1

公开(公告)日：2021-05-20

申请号：US17095590

申请日：2020-11-11

申请人： Intel Corporation

发明人： Subramaniam Maiyuran , Durgaprasad Bilagi , Joydeep Ray , Scott Janus , Sanjeev Jahagirdar , Brent Insko , Lidong Xu , Abhishek R. Appu , James Holland , Vasanth Ranganathan , Nikos Kaburlasos , Altug Koker , Xinmin Tian , Guei-Yuan Lueh , Changliang Wang

IPC分类号： G06T1/60 , G06T1/20 , G06N5/04 , G06F12/0802

摘要： Embodiments described herein are generally directed to improvements relating to power, latency, bandwidth and/or performance issues relating to GPU processing/caching. According to one embodiment, a system includes a producer intellectual property (IP) (e.g., a media IP), a compute core (e.g., a GPU or an AI-specific core of the GPU), a streaming buffer logically interposed between the producer IP and the compute core. The producer IP is operable to consume data from memory and output results to the streaming buffer. The compute core is operable to perform AI inference processing based on data consumed from the streaming buffer and output AI inference processing results to the memory.

10.

发明申请
REGISTER SPILL/FILL USING SHARED LOCAL MEMORY SPACE 有权

公开(公告)号：US20210125581A1

公开(公告)日：2021-04-29

申请号：US17062871

申请日：2020-10-05

申请人： Intel Corporation

发明人： Joydeep Ray , Altug Koker , Balaji Vembu , Murali Ramadoss , Guei-Yuan Lueh , James A. Valerio , Prasoonkumar Surti , Abhishek R. Appu , Vasanth Ranganathan , Kalyan K. Bhiravabhatla , Arthur D. Hunter, JR. , Wei-Yu Chen , Subramaniam M. Maiyuran

IPC分类号： G09G5/36 , G06F12/0875 , G06F9/46 , G09G5/00

摘要： A mechanism is described for facilitating using of a shared local memory for register spilling/filling relating to graphics processors at computing devices. A method of embodiments, as described herein, includes reserving one or more spaces of a shared local memory (SLM) to perform one or more of spilling and filling relating to registers associated with a graphics processor of a computing device.

搜索结果

国家/区域

专利有效性

申请日

公布(公告)日

申请人

申请人所在国/区域

发明人

IPC

IPC部

IPC大类

IPC小类

IPC大组

IPC小组

外观分类