专利检索 ap:("Intel Corporation") AND inv:"Choong Ng" 第 1 页

1.

发明授权
Multicast network and memory transfer optimizations for neural network hardware acceleration 有权

公开(公告)号：US11704548B2

公开(公告)日：2023-07-18

申请号：US17444752

申请日：2021-08-10

申请人： Intel Corporation

发明人： Jeremy Bruestle , Choong Ng

IPC分类号： G06N3/063 , G06F12/06 , G06F9/345 , H04L49/15 , G06N3/04 , H04L15/00

CPC分类号： G06N3/063 , G06F9/345 , G06F12/06 , G06N3/04 , H04L49/1507 , H04L15/00

摘要： In one embodiment, a system to deterministically transfer partitions of contiguous computer readable data in constant time includes a computer readable memory and a modulo address generator. The computer readable memory is organized into D banks, to contain contiguous data including a plurality of data elements of size M which are constituent data elements of a vector with N data elements, the data elements to start at an offset address O. The modulo address generator is to generate the addresses of the data elements of a vector with i data elements stored in the computer readable memory, the modulo address generator including at least one forward permutaton to permute data elements with addresses of the form O+M*i where 0

2.

发明授权
Multicast network and memory transfer optimizations for neural network hardware acceleration 有权

公开(公告)号：US11120329B2

公开(公告)日：2021-09-14

申请号：US15588569

申请日：2017-05-05

申请人： Intel Corporation

发明人： Jeremy Bruestle , Choong Ng

IPC分类号： G06N3/063 , G06F12/06 , G06F9/345 , H04L12/933 , G06N3/04 , H04L15/00

摘要： Neural network specific hardware acceleration optimizations are disclosed, including an optimized multicast network and an optimized DRAM transfer unit to perform in constant or linear time. The multicast network is a set of switch nodes organized into layers and configured to operate as a Beneš network. Configuration data may be accessed by all switch nodes in the network. Each layer is configured to perform a Beneš network transformation of the -previous layer within a computer instruction. Since the computer instructions are pipelined, the entire network of switch nodes may be configured in constant or linear time. Similarly a DRAM transfer unit configured to access memory in strides organizes memory into banks indexed by prime or relatively prime number amounts. The index value is selected as not to cause memory address collisions. Upon receiving a memory specification, the DRAM transfer unit may calculate out strides thereby accessing an entire tile of a tensor in constant or linear time.

3.

发明授权
Apparatus for hardware accelerated machine learning 有权

公开(公告)号：US11790267B2

公开(公告)日：2023-10-17

申请号：US17070009

申请日：2020-10-14

申请人： Intel Corporation

发明人： Jeremy Bruestle , Choong Ng

IPC分类号： G06N20/00 , G06F9/46 , G06F7/48 , G06F5/01 , G06F7/58

CPC分类号： G06N20/00 , G06F5/01 , G06F7/48 , G06F7/582 , G06F9/46 , G06F2207/4824

摘要： An architecture and associated techniques of an apparatus for hardware accelerated machine learning are disclosed. The architecture features multiple memory banks storing tensor data. The tensor data may be concurrently fetched by a number of execution units working in parallel. Each operational unit supports an instruction set specific to certain primitive operations for machine learning. An instruction decoder is employed to decode a machine learning instruction and reveal one or more of the primitive operations to be performed by the execution units, as well as the memory addresses of the operands of the primitive operations as stored in the memory banks. The primitive operations, upon performed or executed by the execution units, may generate some output that can be saved into the memory banks. The fetching of the operands and the saving of the output may involve permutation and duplication of the data elements involved.

4.

发明授权
Apparatus for hardware accelerated machine learning 有权

公开(公告)号：US10817802B2

公开(公告)日：2020-10-27

申请号：US15588558

申请日：2017-05-05

申请人： Intel Corporation

发明人： Jeremy Bruestle , Choong Ng

IPC分类号： G06F12/00 , G06N20/00 , G06F9/46 , G06F7/48 , G06F5/01 , G06F7/58

摘要： An architecture and associated techniques of an apparatus for hardware accelerated machine learning are disclosed. The architecture features multiple memory banks storing tensor data. The tensor data may be concurrently fetched by a number of execution units working in parallel. Each operational unit supports an instruction set specific to certain primitive operations for machine learning. An instruction decoder is employed to decode a machine learning instruction and reveal one or more of the primitive operations to be performed by the execution units, as well as the memory addresses of the operands of the primitive operations as stored in the memory banks. The primitive operations, upon performed or executed by the execution units, may generate some output that can be saved into the memory banks. The fetching of the operands and the saving of the output may involve permutation and duplication of the data elements involved.

5.

发明授权
Preprocessing tensor operations for optimal compilation 有权

公开(公告)号：US10592213B2

公开(公告)日：2020-03-17

申请号：US15787494

申请日：2017-10-18

申请人： Intel Corporation

发明人： Jeremy Bruestle , Choong Ng

IPC分类号： G06F8/30 , G06N20/00 , G06F17/16 , G06N3/063 , G06N3/04 , G06F8/41 , G06F8/40 , G06F9/455 , G06F17/12

摘要： Techniques to preprocess tensor operations prior to code generation to optimize compilation are disclosed. A computer readable representation of a linear algebra or tensor operation is received. A code transformation software component performs transformations include output reduction and fraction removal. The result is a set of linear equations of a single variable with integer coefficients. Such a set lends itself to more efficient code generation during compilation by a code generation software component. Use cases disclosed include targeting a machine learning hardware accelerator, receiving code in the form of an intermediate language generated by a cross-compiler with multiple front ends supporting multiple programming languages, and cloud deployment and execution scenarios.

6.

发明授权
Hardware accelerated machine learning 有权

公开(公告)号：US11170294B2

公开(公告)日：2021-11-09

申请号：US15399714

申请日：2017-01-05

申请人： Intel Corporation

发明人： Jeremy Bruestle , Choong Ng

IPC分类号： G06N3/08 , G06N20/00 , G06F5/01 , G06F7/02 , G06N3/063

摘要： A machine learning hardware accelerator architecture and associated techniques are disclosed. The architecture features multiple memory banks of very wide SRAM that may be concurrently accessed by a large number of parallel operational units. Each operational unit supports an instruction set specific to machine learning, including optimizations for performing tensor operations and convolutions. Optimized addressing, an optimized shift reader and variations on a multicast network that permutes and copies data and associates with an operational unit that support those operations are also disclosed.

7.

发明申请
Apparatus For Hardware Accelerated Machine Learning 有权

公开(公告)号：US20210049508A1

公开(公告)日：2021-02-18

申请号：US17070009

申请日：2020-10-14

申请人： Intel Corporation

发明人： Jeremy Bruestle , Choong Ng

IPC分类号： G06N20/00 , G06F9/46 , G06F7/48 , G06F5/01 , G06F7/58

摘要： An architecture and associated techniques of an apparatus for hardware accelerated machine learning are disclosed. The architecture features multiple memory banks storing tensor data. The tensor data may be concurrently fetched by a number of execution units working in parallel. Each operational unit supports an instruction set specific to certain primitive operations for machine learning. An instruction decoder is employed to decode a machine learning instruction and reveal one or more of the primitive operations to be performed by the execution units, as well as the memory addresses of the operands of the primitive operations as stored in the memory banks. The primitive operations, upon performed or executed by the execution units, may generate some output that can be saved into the memory banks. The fetching of the operands and the saving of the output may involve permutation and duplication of the data elements involved.

8.

发明申请
MULTICAST NETWORK AND MEMORY TRANSFER OPTIMIZATIONS FOR NEURAL NETWORK HARDWARE ACCELERATION 有权

公开(公告)号：US20210374512A1

公开(公告)日：2021-12-02

申请号：US17444752

申请日：2021-08-10

申请人： Intel Corporation

发明人： Jeremy Bruestle , Choong Ng

IPC分类号： G06N3/063 , G06F12/06 , G06F9/345 , H04L12/933 , G06N3/04

摘要： In one embodiment, a system to deterministically transfer partitions of contiguous computer readable data in constant time includes a computer readable memory and a modulo address generator. The computer readable memory is organized into D banks, to contain contiguous data including a plurality of data elements of size M which are constituent data elements of a vector with N data elements, the data elements to start at an offset address O. The modulo address generator is to generate the addresses of the data elements of a vector with i data elements stored in the computer readable memory, the modulo address generator including at least one forward permutaton to permute data elements with addresses of the form O+M*i where 0

搜索结果

国家/区域

专利有效性

申请日

公布(公告)日

申请人

申请人所在国/区域

发明人

IPC

IPC部

IPC大类

IPC小类

IPC大组

IPC小组

外观分类