Patent search ap:("Intel Corporation") AND inv:"Andrew Yang" Page 1

1.

发明申请
MATRIX STORAGE USING DATA SHIFTING MEMORY 审中-公开

公开(公告)号：US20180188972A1

公开(公告)日：2018-07-05

申请号：US15395427

申请日：2016-12-30

Applicant: Intel Corporation

Inventor： Andrew Yang , Carey K. Kloss , Tony L. Werner , Horace Lau

IPC: G06F3/06 , G06F17/16

CPC classification number: G06F3/0611 , G06F3/0646 , G06F3/0658 , G06F3/0673 , G06F11/1048 , G06F17/16 , G11C7/10 , G11C7/1006

Abstract: In one embodiment, an apparatus comprises a memory and a memory controller. The memory comprises a plurality of memory modules, wherein each memory module comprises a plurality of storage locations. The memory controller may be configured to write data of a matrix to the memory. For example, the memory controller may be configured to write a particular row or a particular column of the matrix to the memory by: shifting a plurality of matrix elements of the particular row or the particular column; and writing the plurality of matrix elements to the plurality of memory modules.

2.

发明申请
PIPELINED CONVOLUTIONAL OPERATIONS FOR PROCESSING CLUSTERS 有权

公开(公告)号：US20170097884A1

公开(公告)日：2017-04-06

申请号：US14874784

申请日：2015-10-05

Applicant: Intel Corporation

Inventor： Tony Werner , Aravind Kalaiah , Andrew Yang , Carey Kloss , Horace Lau , Naveen Gandham Rao , Amir Khosrowshahi

IPC: G06F12/02 , G06F9/30

CPC classification number: G06F12/023 , G06F15/76 , G06F2212/251 , G06T1/20

Abstract: Described herein are one or more integrated circuits (ICs) comprising controller circuitry to receive a command to execute an operation for data inputs stored in an external memory or a local memory, and convert the operation into a set of matrix operations to operate on sub-portions of the data inputs. The IC(s) further comprise at least one processing circuitry to execute the set of matrix operations, the processing circuitry to include ALUs, a local memory external to the ALUs and accessible by the ALUs, and processing control circuitry to create at least one matrix operand in the local memory (from the data inputs of the operation) comprising at least one of a scalar, a vector, or a 2D matrix, and provide memory handles corresponding to each of the matrix operands to one of the ALUs to access the respective matrix operands when executing a matrix operation.

3.

发明授权
Artificial neural network training using flexible floating point tensors 有权

公开(公告)号：US12205035B2

公开(公告)日：2025-01-21

申请号：US16004243

申请日：2018-06-08

Applicant: Intel Corporation

Inventor： Krishnakumar Nair , Andrew Yang , Brian Morris

IPC: G06N3/084 , G06F9/30 , G06N3/045 , G06N3/063

Abstract: Thus, the present disclosure is directed to systems and methods for training neural networks using a tensor that includes a plurality of FP16 values and a plurality of bits that define an exponent shared by some or all of the FP16 values included in the tensor. The FP16 values may include IEEE 754 format 16-bit floating point values and the tensor may include a plurality of bits defining the shared exponent. The tensor may include a shared exponent and FP16 values that include a variable bit-length mantissa and a variable bit-length exponent that may be dynamically set by processor circuitry. The tensor may include a shared exponent and FP16 values that include a variable bit-length mantissa; a variable bit-length exponent that may be dynamically set by processor circuitry; and a shared exponent switch set by the processor circuitry to selectively combine the FP16 value exponent with the shared exponent.

4.

发明公开
ARTIFICIAL NEURAL NETWORK TRAINING USING FLEXIBLE FLOATING POINT TENSORS 审中-公开

公开(公告)号：US20240028905A1

公开(公告)日：2024-01-25

申请号：US18478554

申请日：2023-09-29

Applicant: Intel Corporation

Inventor： Krishnakumar Nair , Andrew Yang , Brian Morris

IPC: G06N3/084 , G06N3/063 , G06N3/045

CPC classification number: G06N3/084 , G06N3/063 , G06N3/045 , G06F9/3013

Abstract: Thus, the present disclosure is directed to systems and methods for training neural networks using a tensor that includes a plurality of FP16 values and a plurality of bits that define an exponent shared by some or all of the FP16 values included in the tensor. The FP16 values may include IEEE 754 format 16-bit floating point values and the tensor may include a plurality of bits defining the shared exponent. The tensor may include a shared exponent and FP16 values that include a variable bit-length mantissa and a variable bit-length exponent that may be dynamically set by processor circuitry. The tensor may include a shared exponent and FP16 values that include a variable bit-length mantissa; a variable bit-length exponent that may be dynamically set by processor circuitry; and a shared exponent switch set by the processor circuitry to selectively combine the FP16 value exponent with the shared exponent.

5.

发明授权
Apparatus and method for coherent, accelerated conversion between data representations 有权

公开(公告)号：US10761757B2

公开(公告)日：2020-09-01

申请号：US16024812

申请日：2018-06-30

Applicant: INTEL CORPORATION

Inventor： Krishnakumar Nair , Andrew Yang , Michael Rotzin , Nitin Garegrat , Tom Schebye , Tony Werner

IPC: G06F3/06 , G06F9/30 , G06N3/08

Abstract: An apparatus and method for a converting tensor data. For example, one embodiment of a method comprises: fetching source tensor blocks of a source tensor data structure, each source tensor block comprising a plurality of source tensor data elements having a first numeric representation, wherein the source tensor data structure comprises a predefined structural arrangement of source tensor blocks; converting the one or more source tensor blocks into one or more destination tensor blocks comprising a plurality of destination tensor data elements having a second numeric representation different from the first numeric representation, wherein the sets of one or more source tensor blocks are converted to one or more corresponding destination tensor blocks in a specified order based on the first and second numeric representations; and storing each individual destination tensor block in a designated memory region to maintain coherency with the predefined structural arrangement of the source tensor blocks.

6.

发明授权
Pipelined convolutional operations for processing clusters 有权

公开(公告)号：US09886377B2

公开(公告)日：2018-02-06

申请号：US14874784

申请日：2015-10-05

Applicant: Intel Corporation

Inventor： Tony Werner , Aravind Kalaiah , Andrew Yang , Carey Kloss , Horace Lau , Naveen Gandham Rao , Amir Khosrowshahi

IPC: G06F12/00 , G06F12/02 , G06F15/76 , G06T1/20

CPC classification number: G06F12/023 , G06F15/76 , G06F2212/251 , G06T1/20

Abstract: Described herein are one or more integrated circuits (ICs) comprising controller circuitry to receive a command to execute an operation for data inputs stored in an external memory or a local memory, and convert the operation into a set of matrix operations to operate on sub-portions of the data inputs. The IC(s) further comprise at least one processing circuitry to execute the set of matrix operations, the processing circuitry to include ALUs, a local memory external to the ALUs and accessible by the ALUs, and processing control circuitry to create at least one matrix operand in the local memory (from the data inputs of the operation) comprising at least one of a scalar, a vector, or a 2D matrix, and provide memory handles corresponding to each of the matrix operands to one of the ALUs to access the respective matrix operands when executing a matrix operation.

7.

发明授权
Apparatuses and methods to accelerate matrix multiplication 有权

公开(公告)号：US12254061B2

公开(公告)日：2025-03-18

申请号：US17256195

申请日：2018-09-27

Applicant: Intel Corporation

Inventor： Maciej Urbanski , Brian J. Hickmann , Michael Rotzin , Krishnakumar Nair , Andrew Yang , Brian S. Morris , Dennis Bradford

IPC: G06F17/16 , G06F7/523 , G06F7/544

Abstract: Methods and apparatuses relating to performing vector multiplication are described. Hardware accelerators to perform vector multiplication are also described. In one embodiment, a combined fixed-point and floating-point vector multiplication circuit includes at least one switch to change the circuit between a first mode and a second mode, where in the first mode, each multiplier of a set of multipliers is to multiply mantissas from a same element position of a first floating-point vector and a second floating-point vector to produce a corresponding product, shift the corresponding products with a set of shift registers based on a maximum exponent of exponents for the corresponding products determined by a maximum exponent determiner to produce shifted products, perform an numeric conversion operation on the shifted products with a set of numeric conversion circuits based on sign bits from the same element position of the first floating-point vector and the second floating-point vector to produce signed representations of the shifted products, add the signed representations of the shifted products with a set of adders to produce a single product, and normalize the single product with a normalization circuit based on the maximum exponent into a single floating-point resultant, and in the second mode, each multiplier of the set of multipliers is to multiply values from a same element position of a first integer vector and a second integer vector to produce a corresponding product, and add each corresponding product with the set of adders to produce a single integer resultant.

8.

发明公开
DEEP LEARNING HARDWARE 审中-公开

公开(公告)号：US20240112006A1

公开(公告)日：2024-04-04

申请号：US18534566

申请日：2023-12-08

Applicant: Intel Corporation

Inventor： Horace H. Lau , Prashant Arora , Olivia K. Wu , Tony L. Werner , Carey K. Kloss , Amir Khosrowshahi , Andrew Yang , Aravind Kalaiah , Vijay Anand R. Korthikanti

IPC: G06N3/063 , G06F17/16 , G06N3/04 , G06N3/08

CPC classification number: G06N3/063 , G06F17/16 , G06N3/04 , G06N3/08

Abstract: A network of matrix processing units (MPUs) is provided on a device, where each MPU is connected to at least one other MPU in the network, and each MPU is to perform matrix multiplication operations. Computer memory stores tensor data and a master control central processing unit (MCC) is provided on the device to receive an instruction from a host device, where the instruction includes one or more tensor operands based on the tensor data. The MCC invokes a set of operations on one or more of the MPUs based on the instruction, where the set of operations includes operations on the tensor operands. A result is generated from the set of operations, the result embodied as a tensor value.

9.

发明授权
Software assisted power management 有权

公开(公告)号：US11567555B2

公开(公告)日：2023-01-31

申请号：US16557657

申请日：2019-08-30

Applicant: Intel Corporation

Inventor： Jason Seung-Min Kim , Sundar Ramani , Yogesh Bansal , Nitin N. Garegrat , Olivia K. Wu , Mayank Kaushik , Mrinal Iyer , Tom Schebye , Andrew Yang

IPC: G06F1/324 , G06F1/08 , G06F1/28 , G06F1/12 , G06F9/30 , G06F9/28 , G06N3/08

Abstract: Embodiments include an apparatus comprising an execution unit coupled to a memory, a microcode controller, and a hardware controller. The microcode controller is to identify a global power and performance hint in an instruction stream that includes first and second instruction phases to be executed in parallel, identify a local hint based on synchronization dependence in the first instruction phase, and use the first local hint to balance power consumption between the execution unit and the memory during parallel executions of the first and second instruction phases. The hardware controller is to use the global hint to determine an appropriate voltage level of a compute voltage and a frequency of a compute clock signal for the execution unit during the parallel executions of the first and second instruction phases. The first local hint includes a processing rate for the first instruction phase or an indication of the processing rate.

10.

发明申请
DEEP LEARNING HARDWARE 审中-公开

公开(公告)号：US20190392297A1

公开(公告)日：2019-12-26

申请号：US16474029

申请日：2017-12-28

Applicant: Intel Corporation

Inventor： Horace H. Lau , Prashant Arora , Olivia K. Wu , Tony Werner , Carey K. Kloss , Amir Khosrowshahi , Andrew Yang , Aravind Kalaiah , Vijay Anand R. Korthikanti

IPC: G06N3/063 , G06F17/16 , G06N3/08 , G06N3/04

Abstract: A network of matrix processing units (MPUs) is provided on a device, where each MPU is connected to at least one other MPU in the network, and each MPU is to perform matrix multiplication operations. Computer memory stores tensor data and a master control central processing unit (MCC) is provided on the device to receive an instruction from a host device, where the instruction includes one or more tensor operands based on the tensor data. The MCC invokes a set of operations on one or more of the MPUs based on the instruction, where the set of operations includes operations on the tensor operands. A result is generated from the set of operations, the result embodied as a tensor value.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification