Patent search ap:("Intel Corporation") AND inv:"Aravind Kalaiah" Page 1

1.

发明授权
Winograd algorithm on a matrix processing architecture 有权

公开(公告)号：US10482155B2

公开(公告)日：2019-11-19

申请号：US15395542

申请日：2016-12-30

Applicant: Intel Corporation

Inventor： Tony L. Werner , Aravind Kalaiah

IPC: G06F17/16 , G06F15/80 , G06F17/14 , G06F17/15

Abstract: In one embodiment, a matrix operation may be performed, wherein the matrix operation comprises a matrix multiplication operation on a plurality of matrix operands. Matrix data may be received from a multi-dimensional memory, wherein the matrix data is associated with the plurality of matrix operands. The plurality of matrix operands may be extracted from the matrix data, wherein the plurality of matrix operands comprises a first matrix operand and a second matrix operand. A first transform may be performed on the first matrix operand to obtain a transformed matrix operand, wherein performing matrix multiplication using the transformed matrix operand is faster than performing matrix multiplication using the first matrix operand. Matrix multiplication may be performed on the transformed matrix operand to obtain a partial result. A second transform may be performed on the partial result to obtain a result of the matrix multiplication operation.

2.

发明授权
Distributed matrix multiplication for neural networks 有权

公开(公告)号：US10169296B2

公开(公告)日：2019-01-01

申请号：US15395527

申请日：2016-12-30

Applicant: Intel Corporation

Inventor： Vijay Anand R. Korthikanti , Carey K. Kloss , Aravind Kalaiah , Amir Khosrowshahi

IPC: G06F17/16 , G06N3/08

Abstract: In one embodiment, a matrix operation associated with a plurality of input matrices may be performed. The plurality of input matrices may be partitioned into a plurality of input partitions, wherein the plurality of input matrices is partitioned based on a number of available processing elements. The plurality of input partitions may be distributed among a plurality of processing elements, wherein each input partition is distributed to a particular processing element of the plurality of processing elements. A plurality of partial matrix operations may be performed using the plurality of processing elements, and partial matrix data may be transmitted between the plurality of processing elements while performing the plurality of partial matrix operations. A result of the matrix operation may be determined based on the plurality of partial matrix operations.

3.

发明公开
DEEP LEARNING HARDWARE 审中-公开

公开(公告)号：US20230222331A1

公开(公告)日：2023-07-13

申请号：US18184651

申请日：2023-03-15

Applicant: Intel Corporation

Inventor： Horce H. Lau , Prashant Arora , Olivia K. Wu , Tony L. Werner , Carey K. Kloss , Amir Khosrowshahi , Andrew Yang , Aravind Kalaiah , Vijay Anand R. Korthikanti

IPC: G06N3/063 , G06F17/16 , G06N3/08 , G06N3/04

CPC classification number: G06N3/063 , G06F17/16 , G06N3/04 , G06N3/08

Abstract: A network of matrix processing units (MPUs) is provided on a device, where each MPU is connected to at least one other MPU in the network, and each MPU is to perform matrix multiplication operations. Computer memory stores tensor data and a master control central processing unit (MCC) is provided on the device to receive an instruction from a host device, where the instruction includes one or more tensor operands based on the tensor data. The MCC invokes a set of operations on one or more of the MPUs based on the instruction, where the set of operations includes operations on the tensor operands. A result is generated from the set of operations, the result embodied as a tensor value.

4.

发明申请
DEEP LEARNING HARDWARE 有权

公开(公告)号：US20220245438A1

公开(公告)日：2022-08-04

申请号：US17728175

申请日：2022-04-25

Applicant: Intel Corporation

Inventor： Horce H. Lau , Prashant Arora , Olivia K. Wu , Tony L. Werner , Carey K. Kloss , Amir Khosrowshahi , Andrew Yang , Aravind Kalaiah , Vijay Anand R. Korthikanti

IPC: G06N3/063 , G06F17/16 , G06N3/04 , G06N3/08

Abstract: A network of matrix processing units (MPUs) is provided on a device, where each MPU is connected to at least one other MPU in the network, and each MPU is to perform matrix multiplication operations. Computer memory stores tensor data and a master control central processing unit (MCC) is provided on the device to receive an instruction from a host device, where the instruction includes one or more tensor operands based on the tensor data. The MCC invokes a set of operations on one or more of the MPUs based on the instruction, where the set of operations includes operations on the tensor operands. A result is generated from the set of operations, the result embodied as a tensor value.

5.

发明申请
DIMENSION SHUFFLING USING MATRIX PROCESSORS 审中-公开

公开(公告)号：US20180189227A1

公开(公告)日：2018-07-05

申请号：US15395906

申请日：2016-12-30

Applicant: Intel Corporation

Inventor： Vijay Anand R. Korthikanti , Aravind Kalaiah , Tony L. Werner , Amir Khosrowshahi

IPC: G06F15/173 , G06N3/04 , G11C5/05

CPC classification number: G06F15/17343 , G06F9/30032 , G06F9/30036 , G06F9/30043 , G06N3/0436 , G11C5/05

Abstract: In one embodiment, a matrix operation may be performed to reorder a plurality of dimensions of an input matrix stored in two-dimensional memory. Data associated with the input matrix may be accessed using one or more strided memory operations, wherein the one or more strided memory operations are configured to access the two-dimensional memory at a plurality of locations that are separated by a particular interval. The data accessed using the one or more strided memory operations may be stored in a result matrix, wherein the data accessed using each strided memory operation is stored in the result matrix in non-transpose form or transpose form.

6.

发明申请
DISTRIBUTED CONVOLUTION FOR NEURAL NETWORKS 有权

公开(公告)号：US20220121954A1

公开(公告)日：2022-04-21

申请号：US17564098

申请日：2021-12-28

Applicant: Intel Corporation

Inventor： Vijay Anand R. Korthikanti , Aravind Kalaiah , Tony L. Werner , Carey K. Kloss , Amir Khosrowshahi

IPC: G06N3/08 , G06F17/16 , G06F17/15 , G06N3/063 , G06N3/04

Abstract: In one embodiment, a matrix operation may be performed using a plurality of input matrices, wherein the matrix operation is associated with one or more convolution operations. The plurality of input matrices may be partitioned into a plurality of input partitions, wherein the plurality of input matrices is partitioned based on a number of available processing elements. The plurality of input partitions may be distributed among a plurality of processing elements, wherein each input partition is distributed to a particular processing element of the plurality of processing elements. A plurality of partial matrix operations may be performed using the plurality of processing elements, and partial matrix data may be transmitted between the plurality of processing elements while performing the plurality of partial matrix operations. A result of the matrix operation may be determined based on the plurality of partial matrix operations.

7.

发明授权
Programmable matrix processing engine 有权

公开(公告)号：US10896039B2

公开(公告)日：2021-01-19

申请号：US16264483

申请日：2019-01-31

Applicant: Intel Corporation

Inventor： Tony L. Werner , Aravind Kalaiah , Vijay Korthikanti , Horace Lau

IPC: G06F9/30 , G06N3/063 , G06N3/08

Abstract: In one embodiment, a matrix operation may be performed on one or more matrix operands. For example, matrix data may be received from a multi-dimensional memory, wherein the matrix data is associated with the one or more matrix operands. The one or more matrix operands may be extracted from the matrix data. A matrix routine associated with the matrix operation may be identified. The matrix routine may be executed on a matrix processor using the one or more matrix operands. A result of the matrix operation may be obtained based on the matrix routine executed by the matrix processor.

8.

发明申请
DISTRIBUTED CONVOLUTION FOR NEURAL NETWORKS 审中-公开

公开(公告)号：US20180189652A1

公开(公告)日：2018-07-05

申请号：US15395675

申请日：2016-12-30

Applicant: Intel Corporation

Inventor： Vijay Anand R. Korthikanti , Aravind Kalaiah , Tony L. Werner , Carey K. Kloss , Amir Khosrowshahi

IPC: G06N3/08 , G06F17/16 , G06N3/04

CPC classification number: G06N3/084 , G06F17/153 , G06F17/16 , G06N3/0454 , G06N3/063

Abstract: In one embodiment, a matrix operation may be performed using a plurality of input matrices, wherein the matrix operation is associated with one or more convolution operations. The plurality of input matrices may be partitioned into a plurality of input partitions, wherein the plurality of input matrices is partitioned based on a number of available processing elements. The plurality of input partitions may be distributed among a plurality of processing elements, wherein each input partition is distributed to a particular processing element of the plurality of processing elements. A plurality of partial matrix operations may be performed using the plurality of processing elements, and partial matrix data may be transmitted between the plurality of processing elements while performing the plurality of partial matrix operations. A result of the matrix operation may be determined based on the plurality of partial matrix operations.

9.

发明申请
WINOGRAD ALGORITHM ON A MATRIX PROCESSING ARCHITECTURE 审中-公开

公开(公告)号：US20180189237A1

公开(公告)日：2018-07-05

申请号：US15395542

申请日：2016-12-30

Applicant: Intel Corporation

Inventor： Tony L. Werner , Aravind Kalaiah

IPC: G06F17/16 , G06F15/80

CPC classification number: G06F17/16 , G06F15/80 , G06F17/144 , G06F17/153

Abstract: In one embodiment, a matrix operation may be performed, wherein the matrix operation comprises a matrix multiplication operation on a plurality of matrix operands. Matrix data may be received from a multi-dimensional memory, wherein the matrix data is associated with the plurality of matrix operands. The plurality of matrix operands may be extracted from the matrix data, wherein the plurality of matrix operands comprises a first matrix operand and a second matrix operand. A first transform may be performed on the first matrix operand to obtain a transformed matrix operand, wherein performing matrix multiplication using the transformed matrix operand is faster than performing matrix multiplication using the first matrix operand. Matrix multiplication may be performed on the transformed matrix operand to obtain a partial result. A second transform may be performed on the partial result to obtain a result of the matrix multiplication operation.

10.

发明申请
PIPELINED CONVOLUTIONAL OPERATIONS FOR PROCESSING CLUSTERS 有权

公开(公告)号：US20170097884A1

公开(公告)日：2017-04-06

申请号：US14874784

申请日：2015-10-05

Applicant: Intel Corporation

Inventor： Tony Werner , Aravind Kalaiah , Andrew Yang , Carey Kloss , Horace Lau , Naveen Gandham Rao , Amir Khosrowshahi

IPC: G06F12/02 , G06F9/30

CPC classification number: G06F12/023 , G06F15/76 , G06F2212/251 , G06T1/20

Abstract: Described herein are one or more integrated circuits (ICs) comprising controller circuitry to receive a command to execute an operation for data inputs stored in an external memory or a local memory, and convert the operation into a set of matrix operations to operate on sub-portions of the data inputs. The IC(s) further comprise at least one processing circuitry to execute the set of matrix operations, the processing circuitry to include ALUs, a local memory external to the ALUs and accessible by the ALUs, and processing control circuitry to create at least one matrix operand in the local memory (from the data inputs of the operation) comprising at least one of a scalar, a vector, or a 2D matrix, and provide memory handles corresponding to each of the matrix operands to one of the ALUs to access the respective matrix operands when executing a matrix operation.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification