Patent search ap:("INTEL Corporation") AND inv:"Mu Page Shuai"

1.

发明公开
SUPPORTING 8-BIT FLOATING POINT FORMAT OPERANDS IN A COMPUTING ARCHITECTURE 审中-公开

公开(公告)号：EP4064040A1

公开(公告)日：2022-09-28

申请号：EP22153430.8

申请日：2022-01-26

Applicant: Intel Corporation

Inventor： Mellempudi, Naveen , Maiyuran, Subramaniam , George, Varghese , Fu, Fangwen , Mu, Shuai , Pal, Supratim , Xiong, Wei

IPC: G06F9/30 , G06F9/38

Abstract: An apparatus to facilitate supporting 8-bit floating point format operands in a computing architecture is disclosed. The apparatus includes a processor comprising: a decoder to decode an instruction fetched for execution into a decoded instruction, wherein the decoded instruction is a matrix instruction that operates on 8-bit floating point operands to cause the processor to perform a parallel dot product operation; a controller to schedule the decoded instruction and provide input data for the 8-bit floating point operands in accordance with an 8-bit floating data format indicated by the decoded instruction; and systolic dot product circuitry to execute the decoded instruction using systolic layers, each systolic layer comprises one or more sets of interconnected multipliers, shifters, and adder, each set of multipliers, shifters, and adders to generate a dot product of the 8-bit floating point operands.

2.

发明公开
MULTIPLE REGISTER ALLOCATION SIZES FOR THREADS 审中-公开

公开(公告)号：EP4109243A1

公开(公告)日：2022-12-28

申请号：EP22162082.6

申请日：2022-03-15

Applicant: Intel Corporation

Inventor： Gurram, Chandra , Chen, Wei-Yu , Vemulapalli, Vikranth , Maiyuran, Subramaniam , Parra, Jorge , Mu, Shuai , Lueh, Guei-Yuan , Pal, Supratim

IPC: G06F9/30 , G06F9/38

Abstract: Provision of multiple register allocation sizes for threads is described. An example of a system includes one or more processors including a graphics processor, the graphics processor including at least a first local thread dispatcher (TDL) and multiple processing resources, each processing resource including a plurality of registers; and memory for storage of data for processing, wherein the one or more processors are to determine a register size for a first thread; identify one or more processing resources having sufficient register space for the first thread; select a processing resource of the one or more processing resources having sufficient register space to assign the first thread; select an available thread slot of the selected processing resource for the first thread; and allocate registers of the selected processing resource for the first thread.

3.

发明公开
FLOATING-POINT CONVERSION VIA AN INTEGER UNIT 审中-公开

公开(公告)号：EP4498238A1

公开(公告)日：2025-01-29

申请号：EP23208987.0

申请日：2023-11-10

Applicant: Intel Corporation

Inventor： Pal, Supratim , Chen, Jiasheng , Hurd, Kevin , Parra Osorio, Jorge Eduardo , Spencer, Christopher , Lueh, Guei-Yuan , Golconda, Pradeep K. , Fu, Fangwen , Xiong, Wei , Li, Hongzheng , Valerio, James , Swaminathan, Mukundan , Murphy, Nicholas , Mu, Shuai , Gibson, Clifford , Cheng, Buqi

IPC: G06F9/30 , G06F9/38

Abstract: Described herein is a graphics processor comprising a memory interface and a graphics processing cluster coupled with the memory interface. The graphics processing cluster includes a multi-lane parallel floating-point unit and a multi-lane parallel integer unit. The multi-lane parallel integer unit includes an integer pipeline including a plurality of parallel integer logic units configured to perform integer compute operations on a plurality of input data elements and a format conversion pipeline including a plurality of parallel format conversion units configured to convert a plurality of input data elements from a first one of a plurality of datatype formats to a second one of the plurality of datatype formats, the plurality of datatype formats including integer and floating-point formats.

4.

发明公开
AVOIDING THE USE OF A RESULT CROSSBAR WHEN DOWN CONVERTING TO PACKED REGISTER FORMATS 审中-公开

公开(公告)号：EP4498236A1

公开(公告)日：2025-01-29

申请号：EP23208102.6

申请日：2023-11-07

Applicant: Intel Corporation

Inventor： Pal, Supratim , Chen, Jiasheng , Spencer, Christopher , Parra Osorio, Jorge E. , Hurd, Kevin , Lueh, Guei-Yuan , Golconda, Pradeep K. , Fu, Fangwen , Xiong, Wei , Li, Hongzheng , Valerio, James , Swaminathan, Mukundan , Murphy, Nicholas , Mu, Shuai , Gibson, Clifford , Cheng, Buqi

IPC: G06F9/30

Abstract: Described herein is a graphics processor comprising a memory interface and a graphics processing cluster coupled with the memory interface. The graphics processing cluster includes a plurality of processing resources. A processing resource of the plurality of processing resources includes a source crossbar communicatively coupled with a register file, the source crossbar to reorder data elements of a source operand and a format conversion pipeline to convert a plurality of input data elements specified by the source operand from a first format of a plurality of datatype formats to a second format of the plurality of datatype formats, the plurality of datatype formats including integer and floating-point formats.

5.

发明公开
SUPPORTING 8-BIT FLOATING POINT FORMAT OPERANDS IN A COMPUTING ARCHITECTURE 审中-公开

公开(公告)号：EP4485181A2

公开(公告)日：2025-01-01

申请号：EP24205439.3

申请日：2022-01-26

Applicant: INTEL Corporation

Inventor： Mellempudi, Naveen , Maiyuran, Subramaniam , George, Varghese , Fu, Fangwen , Mu, Shuai , Pal, Supratim , Xiong, Wei

IPC: G06F9/38

Abstract: An apparatus comprises decode circuitry to decode a single matrix instruction having fields to indicate an opcode and locations of a first source matrix including a first plurality of 8-bit floating point data elements encoded in a first 8-bit floating point format, a second source matrix including a second plurality of 8-bit floating point data elements encoded in a second 8-bit floating point format, and a third source matrix including a plurality of 32-bit floating point data elements. The apparatus further comprises execution circuitry, responsive to the single matrix instruction, to generate a plurality of products based on the first plurality of 8-bit floating point data elements of the first source matrix and the second plurality of 8-bit floating point data elements of the second source matrix, and accumulate each product of the plurality of products with a corresponding 32-bit floating point data element of the third source matrix to generate a corresponding 32-bit floating point result data element of a result matrix.

Patent Agency Ranking