Patent search ap:("INTEL CORPORATION") AND inv:"Naveen Mellempudi" Page 3

21.

发明授权
Incremental precision networks using residual inference and fine-grain quantization 有权

公开(公告)号：US12198055B2

公开(公告)日：2025-01-14

申请号：US18532795

申请日：2023-12-07

Applicant: Intel Corporation

Inventor： Abhisek Kundu , Naveen Mellempudi , Dheevatsa Mudigere , Dipankar Das

IPC: G06N3/08 , G06F9/46 , G06N3/044 , G06N3/045 , G06N3/063 , G06N3/084 , G06N5/04 , G06T15/00 , G06T15/04 , G06T15/80 , G06T17/10 , G06T17/20 , G06V10/94

Abstract: One embodiment provides for a computer-readable medium storing instructions that cause one or more processors to perform operations comprising determining a per-layer scale factor to apply to tensor data associated with layers of a neural network model and converting the tensor data to converted tensor data. The tensor data may be converted from a floating point datatype to a second datatype that is an 8-bit datatype. The instructions further cause the one or more processors to generate an output tensor based on the converted tensor data and the per-layer scale factor.

22.

发明公开
SUPPORTING 8-BIT FLOATING POINT FORMAT OPERANDS IN A COMPUTING ARCHITECTURE 审中-公开

公开(公告)号：US20240256274A1

公开(公告)日：2024-08-01

申请号：US18618648

申请日：2024-03-27

Applicant: Intel Corporation

Inventor： Naveen Mellempudi , Subramaniam Maiyuran , Varghese George , Fangwen Fu , Shuai Mu , Supratim Pal , Wei Xiong

IPC: G06F9/30 , G06F9/38 , G06F9/48 , G06F17/16 , G06N20/00

CPC classification number: G06F9/30014 , G06F9/3818 , G06F9/4843 , G06F17/16 , G06N20/00

Abstract: An apparatus to facilitate supporting 8-bit floating point format operands in a computing architecture is disclosed. The apparatus includes a processor comprising: a decoder to decode an instruction fetched for execution into a decoded instruction, wherein the decoded instruction is a matrix instruction that operates on 8-bit floating point operands to cause the processor to perform a parallel dot product operation; a controller to schedule the decoded instruction and provide input data for the 8-bit floating point operands in accordance with an 8-bit floating data format indicated by the decoded instruction; and systolic dot product circuitry to execute the decoded instruction using systolic layers, each systolic layer comprises one or more sets of interconnected multipliers, shifters, and adder, each set of multipliers, shifters, and adders to generate a dot product of the 8-bit floating point operands.

23.

发明授权
Apparatuses, methods, and systems for 8-bit floating-point matrix dot product instructions 有权

公开(公告)号：US12020028B2

公开(公告)日：2024-06-25

申请号：US17134373

申请日：2020-12-26

Applicant: Intel Corporation

Inventor： Naveen Mellempudi , Alexander F. Heinecke , Robert Valentine , Mark J. Charney , Christopher J. Hughes , Evangelos Georganas , Zeev Sperber , Amit Gradstein , Simon Rubanovich

IPC: G06F9/30 , G06F7/499 , G06F9/38

CPC classification number: G06F9/30036 , G06F7/49915 , G06F9/30196 , G06F9/3887

Abstract: Systems, methods, and apparatuses relating to 8-bit floating-point matrix dot product instructions are described. A processor embodiment includes fetch circuitry to fetch an instruction having fields to specify an opcode and locations of a destination matrix having single-precision elements, a first source matrix, and a second source matrix, the source matrices having elements that each comprise a quadruple of 8-bit floating-point values, the opcode to indicate execution circuitry is to cause, for each element of the first source matrix and corresponding element of the second source matrix, a conversion of the 8-bit floating-point values to single-precision values, a multiplication of different pairs of converted single-precision values to generate plurality of results, and an accumulation of the results with previous contents of a corresponding element of the destination matrix, decode circuitry to decode the fetched instruction, and the execution circuitry to respond to the decoded instruction as specified by the opcode.

24.

发明公开
8-BIT FLOATING POINT FUSED MULTIPLY INSTRUCTIONS 审中-公开

公开(公告)号：US20240045688A1

公开(公告)日：2024-02-08

申请号：US17958369

申请日：2022-10-01

Applicant: Intel Corporation

Inventor： Alexander Heinecke , Menachem Adelman , Evangelos Georganas , Amit Gradstein , Christopher Hughes , Naveen Mellempudi , Simon Rubanovich , Uri Sherman , Zeev Sperber

IPC: G06F9/30 , G06F7/487

CPC classification number: G06F9/3016 , G06F7/4876 , G06F9/3001

Abstract: Techniques for performing FP8 FMA in response to an instruction are described. In some examples, an instruction has fields for an opcode, an identification of location of a packed data source/destination operand (a first source), an identification of a location of a second packed data source operand, an identification of a location of a third packed data source operand, and an identification of location of a packed data source/destination operand, wherein the opcode is to indicate operand ordering and that execution circuitry is to, per data element position, perform a FP8 value fused multiply-accumulate operation using the first, second, and third source operands and store a result in a corresponding data element position of the source/destination operand, wherein the FP8 value has an 8-bit floating point format that comprises one bit for a sign, at least 4 bits for an exponent, and at least two bits for a fraction.

25.

发明公开
INSTRUCTIONS TO CONVERT FROM FP8 审中-公开

公开(公告)号：US20240045686A1

公开(公告)日：2024-02-08

申请号：US17958382

申请日：2022-10-01

Applicant: Intel Corporation

Inventor： Alexander Heinecke , Menachem Adelman , Evangelos Georganas , Amit Gradstein , Christopher Hughes , Naveen Mellempudi , Simon Rubanovich , Uri Sherman , Zeev Sperber

IPC: G06F9/30

CPC classification number: G06F9/30145 , G06F9/30025

Abstract: Techniques for converting FP8 data elements to FP16 or FP32 data elements using a single instruction are described. An example apparatus includes decoder circuitry to decode a single instruction, the single instruction to indicate that execution circuitry is to convert packed FP8 data from the identified source to packed half-precision floating-point data or single-precision floating point data and store the packed half-precision floating-point data or single-precision floating point data into corresponding data element positions of the identified destination operand.

26.

发明公开
APPARATUSES, METHODS, AND SYSTEMS FOR INSTRUCTIONS FOR STRUCTURED-SPARSE TILE MATRIX FMA 审中-公开

公开(公告)号：US20240045685A1

公开(公告)日：2024-02-08

申请号：US17958381

申请日：2022-10-01

Applicant: Intel Corporation

Inventor： Menachem Adelman , Amit Gradstein , Alexander Heinecke , Christopher Hughes , Naveen Mellempudi , Shahar Mizrahi , Dana Rip , Simon Rubanovich , Uri Sherman , Guy Boudoukh , Evangelos Georganas , Nilesh Jain , Barukh Ziv

IPC: G06F9/30

CPC classification number: G06F9/30145 , G06F9/30025 , G06F9/3001

Abstract: Systems, methods, and apparatuses relating sparsity based FMA. In some examples, an instance of a single FMA instruction has one or more fields for an opcode, one or more fields to identify a source/destination matrix operand, one or more fields to identify a first plurality of source matrix operands, one or more fields to identify a second plurality of matrix operands, wherein the opcode is to indicate that execution circuitry is to select a proper subset of FP8 data elements from the first plurality of source matrix operands based on sparsity controls from a first matrix operand of the second plurality of matrix operands and perform a FMA.

27.

发明公开
8-BIT FLOATING POINT COMPARISON INSTRUCTIONS 审中-公开

公开(公告)号：US20240045681A1

公开(公告)日：2024-02-08

申请号：US17958367

申请日：2022-10-01

Applicant: Intel Corporation

Inventor： Alexander Heinecke , Menachem Adelman , Evangelos Georganas , Amit Gradstein , Christopher Hughes , Naveen Mellempudi , Simon Rubanovich , Uri Sherman , Zeev Sperber

IPC: G06F9/30

CPC classification number: G06F9/30145 , G06F9/30036 , G06F9/30094

Abstract: Techniques for comparing FP8 data elements are described. An exemplary FP8 comparison instruction includes fields for an opcode, an identification of a location of a first packed data source operand, and an identification of a location of a second packed data source operand, wherein the opcode is to indicate that execution circuitry is to perform, for a particular data element position of the packed data source operands, a comparison of a data element at that position, and update a flags register based on the comparison.

28.

发明授权
Scaling half-precision floating point tensors for training deep neural networks 有权

公开(公告)号：US11507815B2

公开(公告)日：2022-11-22

申请号：US17742138

申请日：2022-05-11

Applicant: Intel Corporation

Inventor： Naveen Mellempudi , Dipankar Das

IPC: G06T1/20 , G06F5/01 , G06N3/063 , G06F7/487 , G06F7/544 , G06N3/04 , G06N3/08

Abstract: A graphics processor is described that includes a single instruction, multiple thread (SIMT) architecture including hardware multithreading. The multiprocessor can execute parallel threads of instructions associated with a command stream, where the multiprocessor includes a set of functional units to execute at least one of the parallel threads of the instructions. The set of functional units can include a mixed precision tensor processor to perform tensor computations. The functional units can also include circuitry to analyze statistics for output values of the tensor computations, determine a target format to convert the output values, the target format determined based on the statistics for the output values and a precision associated with a second layer of the neural network, and convert the output values to the target format.

29.

发明申请
INSTRUCTIONS TO CONVERT FROM FP16 TO BF8 有权

公开(公告)号：US20220206805A1

公开(公告)日：2022-06-30

申请号：US17134353

申请日：2020-12-26

Applicant: Intel Corporation

Inventor： Alexander Heinecke , Naveen Mellempudi , Robert Valentine , Mark Charney , Christopher Hughes , Evangelos Georganas , Zeev Sperber , Amit Gradstein , Simon Rubanovich

IPC: G06F9/30 , G06F7/499 , H03M7/24

Abstract: Techniques for converting FP16 data elements to BF8 data elements using a single instruction are described. An exemplary apparatus includes decoder circuitry to decode a single instruction, the single instruction to include a one or more fields to identify a source operand, one or more fields to identify a destination operand, and one or more fields for an opcode, the opcode to indicate that execution circuitry is to convert packed half-precision floating-point data from the identified source to packed bfloat8 data and store the packed bfloat8 data into corresponding data element positions of the identified destination operand; and execution circuitry to execute the decoded instruction according to the opcode to convert packed half-precision floating-point data from the identified source to packed bfloat8 data and store the packed bfloat8 data into corresponding data element positions.

30.

发明申请
HARDWARE APPARATUSES AND METHODS RELATING TO ELEMENTAL REGISTER ACCESSES 有权
Title translation: 硬件设备和与元件寄存器访问相关的方法

公开(公告)号：US20160188334A1

公开(公告)日：2016-06-30

申请号：US14582784

申请日：2014-12-24

Applicant: Intel Corporation

Inventor： Victor Lee , Ugonna Echeruo , George Chrysos , Naveen Mellempudi

IPC: G06F9/30

CPC classification number: G06F9/30036

Abstract: Methods and apparatuses relating to a vector instruction with a register operand with an elemental offset are described. In one embodiment, a hardware processor includes a decode unit to decode a vector instruction with a register operand with an elemental offset to access a first number of elements in a register specified by the register operand, wherein the first number is a total number of elements in the register minus the elemental offset, access a second number of elements in a next logical register, wherein the second number is the elemental offset, and combine the first number of elements and the second number of elements as a data vector, and an execution unit to execute the vector instruction on the data vector.

Abstract translation: 描述与具有具有基本偏移的寄存器操作数的向量指令相关的方法和装置。在一个实施例中，硬件处理器包括解码单元，用于对具有基本偏移量的寄存器操作数解码向量指令，以访问由寄存器操作数指定的寄存器中的第一数量的元素，其中第一个数字是元素的总数在所述寄存器中减去所述元素偏移量，访问下一逻辑寄存器中的第二数量的元素，其中所述第二数量是所述元素偏移量，并且将所述第一数量的元素和所述第二数量的元素组合为数据向量，以及执行单元来执行数据向量的向量指令。

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification