专利检索 ap:("Intel Corporation") AND inv:"Naveen Mellempudi" 第 2 页

11.

发明授权
Scaling half-precision floating point tensors for training deep neural networks 有权

公开(公告)号：US11468303B2

公开(公告)日：2022-10-11

申请号：US16526376

申请日：2019-07-30

申请人： Intel Corporation

发明人： Naveen Mellempudi , Dipankar Das

IPC分类号： G06T1/20 , G06F5/01 , G06N3/063 , G06F7/487 , G06F7/544 , G06N3/04 , G06N3/08

摘要： A graphics processor is described that includes a single instruction, multiple thread (SIMT) architecture including hardware multithreading. The multiprocessor can execute parallel threads of instructions associated with a command stream, where the multiprocessor includes a set of functional units to execute at least one of the parallel threads of the instructions. The set of functional units can include a mixed precision tensor processor to perform tensor computations to generate loss data. The loss data is stored as a floating-point data type and scaled by a scaling factor to enable a data distribution of a gradient tensor generated based on the loss data to be represented by a 16-bit floating point data type.

12.

发明授权
Scaling half-precision floating point tensors for training deep neural networks 有权

公开(公告)号：US12106210B2

公开(公告)日：2024-10-01

申请号：US18456272

申请日：2023-08-25

申请人： Intel Corporation

发明人： Naveen Mellempudi , Dipankar Das

IPC分类号： G06N3/044 , G06F5/01 , G06F7/487 , G06F7/544 , G06N3/045 , G06N3/063 , G06N3/084 , G06T1/20

CPC分类号： G06N3/063 , G06F5/012 , G06F7/487 , G06F7/5443 , G06N3/044 , G06N3/045 , G06N3/084 , G06T1/20

摘要： One embodiment provides for a machine-learning accelerator device a multiprocessor to execute parallel threads of an instruction stream, the multiprocessor including a compute unit, the compute unit including a set of functional units, each functional unit to execute at least one of the parallel threads of the instruction stream. The compute unit includes compute logic configured to execute a single instruction to scale an input tensor associated with a layer of a neural network according to a scale factor, the input tensor stored in a floating-point data type, the compute logic to scale the input tensor to enable a data distribution of data of the input tensor to be represented by a 16-bit floating point data type.

13.

发明授权
Scaling half-precision floating point tensors for training deep neural networks 有权

公开(公告)号：US11823034B2

公开(公告)日：2023-11-21

申请号：US17960947

申请日：2022-10-06

申请人： Intel Corporation

发明人： Naveen Mellempudi , Dipankar Das

IPC分类号： G06N3/063 , G06T1/20 , G06F7/487 , G06F7/544 , G06F5/01 , G06N3/084 , G06N3/044 , G06N3/045

CPC分类号： G06N3/063 , G06F5/012 , G06F7/487 , G06F7/5443 , G06N3/044 , G06N3/045 , G06N3/084 , G06T1/20

摘要： A graphics processor is described that includes a single instruction, multiple thread (SIMT) architecture including hardware multithreading. The multiprocessor can execute parallel threads of instructions associated with a command stream, where the multiprocessor includes a set of functional units to execute at least one of the parallel threads of the instructions. The set of functional units can include a mixed precision tensor processor to perform tensor computations to generate loss data. The loss data is stored as a first floating-point data type and scaled by a scaling factor to enable a data distribution of a gradient tensor generated based on the loss data to be represented by a second floating point data type.

14.

发明授权
Incremental precision networks using residual inference and fine-grain quantization 有权

公开(公告)号：US11556772B2

公开(公告)日：2023-01-17

申请号：US15869515

申请日：2018-01-12

申请人： Intel Corporation

发明人： Abhisek Kundu , Naveen Mellempudi , Dheevatsa Mudigere , Dipankar Das

IPC分类号： G06N3/08 , G06N5/04 , G06N3/04 , G06T15/00 , G06F9/46 , G06N3/063 , G06T17/20 , G06T15/80 , G06T17/10 , G06T15/04 , G06V10/94

摘要： One embodiment provides for a computing device comprising a parallel processor compute unit to perform a set of parallel integer compute operations; a ternarization unit including a weight ternarization circuit and an activation quantization circuit; wherein the weight ternarization circuit is to convert a weight tensor from a floating-point representation to a ternary representation including a ternary weight and a scale factor; wherein the activation quantization circuit is to convert an activation tensor from a floating-point representation to an integer representation; and wherein the parallel processor compute unit includes one or more circuits to perform the set of parallel integer compute operations on the ternary representation of the weight tensor and the integer representation of the activation tensor.

15.

发明申请
SUPPORTING 8-BIT FLOATING POINT FORMAT OPERANDS IN A COMPUTING ARCHITECTURE 有权

公开(公告)号：US20220318013A1

公开(公告)日：2022-10-06

申请号：US17212588

申请日：2021-03-25

申请人： Intel Corporation

发明人： Naveen Mellempudi , Subramaniam Maiyuran , Varghese George , Fangwen Fu , Shuai Mu , Supratim Pal , Wei Xiong

IPC分类号： G06F9/30 , G06F9/38 , G06F9/48

摘要： An apparatus to facilitate supporting 8-bit floating point format operands in a computing architecture is disclosed. The apparatus includes a processor comprising: a decoder to decode an instruction fetched for execution into a decoded instruction, wherein the decoded instruction is a matrix instruction that operates on 8-bit floating point operands to cause the processor to perform a parallel dot product operation; a controller to schedule the decoded instruction and provide input data for the 8-bit floating point operands in accordance with an 8-bit floating data format indicated by the decoded instruction; and systolic dot product circuitry to execute the decoded instruction using systolic layers, each systolic layer comprises one or more sets of interconnected multipliers, shifters, and adder, each set of multipliers, shifters, and adders to generate a dot product of the 8-bit floating point operands.

16.

发明申请
INSTRUCTIONS TO CONVERT FROM FP16 TO BF8 有权

公开(公告)号：US20220206743A1

公开(公告)日：2022-06-30

申请号：US17134358

申请日：2020-12-26

申请人： Intel Corporation

发明人： Alexander Heinecke , Naveen Mellempudi , Robert Valentine , Mark Charney , Christopher Hughes , Evangelos Georganas , Zeev Sperber , Amit Gradstein , Simon Rubanovich

IPC分类号： G06F5/00

摘要： Techniques for converting FP16 to BF8 using bias are described. An exemplary embodiment utilizes decoder circuitry to decode a single instruction, the single instruction to include one or more fields to identify a first source operand, one or more fields to identify a second source operand, one or more fields to identify a source/destination operand, and one or more fields for an opcode, wherein the opcode is to indicate that execution circuitry is to convert packed half-precision data from the identified first and second sources to packed bfloat8 data using bias terms from the identified source/destination operand and store the packed bfloat8 data into corresponding data element positions of the identified source/destination operand; and execution circuitry to execute the decoded instruction according to the opcode to convert packed half-precision data from the identified first and second sources to packed bfloat8 data using bias terms from the identified source/destination operand and store the packed bfloat8 data into corresponding data element positions of the identified source/destination operand.

17.

发明授权
Dynamic precision management for integer deep learning primitives 有权

公开(公告)号：US10825127B2

公开(公告)日：2020-11-03

申请号：US16853405

申请日：2020-04-20

申请人： Intel Corporation

发明人： Naveen Mellempudi , Dheevatsa Mudigere , Dipankar Das , Srinivas Sridharan

IPC分类号： G06T1/20 , G06N3/08 , G06N3/04 , G06F7/544 , G06F17/15 , G06F5/01 , G06F7/523 , G06F17/16 , G06N3/063 , G06F7/501

摘要： One embodiment provides for a graphics processing unit to perform computations associated with a neural network, the graphics processing unit comprising compute unit including a hardware logic unit having dynamic precision fixed-point logic, the compute unit to receive a set of dynamic fixed-point tensors, compute, via the dynamic precision fixed-point logic, a right-shift value using an absolute maximum value within the set of dynamic fixed-point tensors and a dynamic range of the set of dynamic fixed-point tensors, right-shift data values within the set of dynamic fixed-point tensors based on the right-shift value, increment a shared exponent associated with the set of dynamic fixed-point tensors based on the right-shift value, perform a compute operation on the set of dynamic fixed-point tensors, and generate an output tensor via the compute operation on the set of dynamic fixed-point tensors.

18.

发明申请
HARDWARE APPARATUSES AND METHODS RELATING TO ELEMENTAL REGISTER ACCESSES 审中-公开

公开(公告)号：US20190138305A1

公开(公告)日：2019-05-09

申请号：US16003555

申请日：2018-06-08

申请人： INTEL CORPORATION

发明人： Victor Lee , Ugonna Echeruo , George Chrysos , Naveen Mellempudi

IPC分类号： G06F9/30

CPC分类号： G06F9/30036

摘要： Methods and apparatuses relating to a vector instruction with a register operand with an elemental offset are described. In one embodiment, a hardware processor includes a decode unit to decode a vector instruction with a register operand with an elemental offset to access a first number of elements in a register specified by the register operand, wherein the first number is a total number of elements in the register minus the elemental offset, access a second number of elements in a next logical register, wherein the second number is the elemental offset, and combine the first number of elements and the second number of elements as a data vector, and an execution unit to execute the vector instruction on the data vector.

19.

发明公开
SUPPORTING 8-BIT FLOATING POINT FORMAT OPERANDS IN A COMPUTING ARCHITECTURE 审中-公开

公开(公告)号：US20240256274A1

公开(公告)日：2024-08-01

申请号：US18618648

申请日：2024-03-27

申请人： Intel Corporation

发明人： Naveen Mellempudi , Subramaniam Maiyuran , Varghese George , Fangwen Fu , Shuai Mu , Supratim Pal , Wei Xiong

IPC分类号： G06F9/30 , G06F9/38 , G06F9/48 , G06F17/16 , G06N20/00

CPC分类号： G06F9/30014 , G06F9/3818 , G06F9/4843 , G06F17/16 , G06N20/00

摘要： An apparatus to facilitate supporting 8-bit floating point format operands in a computing architecture is disclosed. The apparatus includes a processor comprising: a decoder to decode an instruction fetched for execution into a decoded instruction, wherein the decoded instruction is a matrix instruction that operates on 8-bit floating point operands to cause the processor to perform a parallel dot product operation; a controller to schedule the decoded instruction and provide input data for the 8-bit floating point operands in accordance with an 8-bit floating data format indicated by the decoded instruction; and systolic dot product circuitry to execute the decoded instruction using systolic layers, each systolic layer comprises one or more sets of interconnected multipliers, shifters, and adder, each set of multipliers, shifters, and adders to generate a dot product of the 8-bit floating point operands.

20.

发明授权
Apparatuses, methods, and systems for 8-bit floating-point matrix dot product instructions 有权

公开(公告)号：US12020028B2

公开(公告)日：2024-06-25

申请号：US17134373

申请日：2020-12-26

申请人： Intel Corporation

发明人： Naveen Mellempudi , Alexander F. Heinecke , Robert Valentine , Mark J. Charney , Christopher J. Hughes , Evangelos Georganas , Zeev Sperber , Amit Gradstein , Simon Rubanovich

IPC分类号： G06F9/30 , G06F7/499 , G06F9/38

CPC分类号： G06F9/30036 , G06F7/49915 , G06F9/30196 , G06F9/3887

摘要： Systems, methods, and apparatuses relating to 8-bit floating-point matrix dot product instructions are described. A processor embodiment includes fetch circuitry to fetch an instruction having fields to specify an opcode and locations of a destination matrix having single-precision elements, a first source matrix, and a second source matrix, the source matrices having elements that each comprise a quadruple of 8-bit floating-point values, the opcode to indicate execution circuitry is to cause, for each element of the first source matrix and corresponding element of the second source matrix, a conversion of the 8-bit floating-point values to single-precision values, a multiplication of different pairs of converted single-precision values to generate plurality of results, and an accumulation of the results with previous contents of a corresponding element of the destination matrix, decode circuitry to decode the fetched instruction, and the execution circuitry to respond to the decoded instruction as specified by the opcode.

搜索结果

国家/区域

专利有效性

申请日

公布(公告)日

申请人

申请人所在国/区域

发明人

IPC

IPC部

IPC大类

IPC小类

IPC大组

IPC小组

外观分类