专利检索 ap:("Advanced Micro Devices, Inc." OR "ATI Technologies ULC") AND inv:"Allen Rush" 第 1 页

1.

发明申请
INTEGRATED VIDEO CODEC AND INFERENCE ENGINE 审中-公开

公开(公告)号：US20190028752A1

公开(公告)日：2019-01-24

申请号：US15657613

申请日：2017-07-24

申请人： Advanced Micro Devices, Inc. , ATI Technologies ULC

发明人： Lei Zhang , Sateesh Lagudu , Allen Rush , Razvan Dan-Dobre

IPC分类号： H04N21/4143 , G06F3/14 , H04N7/14 , H04N7/15

摘要： Systems, apparatuses, and methods for integrating a video codec with an inference engine are disclosed. A system is configured to implement an inference engine and a video codec while sharing at least a portion of its processing elements between the inference engine and the video codec. By sharing processing elements when combining the inference engine and the video codec, the silicon area of the combination is reduced. In one embodiment, the portion of processing elements which are shared include a motion prediction/motion estimation/MACs engine with a plurality of multiplier-accumulator (MAC) units, an internal memory, and peripherals. The peripherals include a memory interface, a direct memory access (DMA) engine, and a microprocessor. The system is configured to perform a context switch to reprogram the processing elements to switch between operating modes. The context switch can occur at a frame boundary or at a sub-frame boundary.

2.

发明申请
MEMORY BANDWIDTH REDUCTION TECHNIQUES FOR LOW POWER CONVOLUTIONAL NEURAL NETWORK INFERENCE APPLICATIONS 有权

公开(公告)号：US20220129752A1

公开(公告)日：2022-04-28

申请号：US17571045

申请日：2022-01-07

申请人： Advanced Micro Devices, Inc. , ATI Technologies ULC

发明人： Sateesh Lagudu , Lei Zhang , Allen Rush

IPC分类号： G06N3/08 , G06N3/063 , G06N3/04 , G06F1/3296

摘要： Systems, apparatuses, and methods for implementing memory bandwidth reduction techniques for low power convolutional neural network inference applications are disclosed. A system includes at least a processing unit and an external memory coupled to the processing unit. The system detects a request to perform a convolution operation on input data from a plurality of channels. Responsive to detecting the request, the system partitions the input data from the plurality of channels into 3D blocks so as to minimize the external memory bandwidth utilization for the convolution operation being performed. Next, the system loads a selected 3D block from external memory into internal memory and then generates convolution output data for the selected 3D block for one or more features. Then, for each feature, the system adds convolution output data together across channels prior to writing the convolution output data to the external memory.

3.

发明申请
MACHINE LEARNING INFERENCE ENGINE SCALABILITY 审中-公开

公开(公告)号：US20190325305A1

公开(公告)日：2019-10-24

申请号：US16117302

申请日：2018-08-30

申请人： Advanced Micro Devices, Inc. , ATI Technologies ULC

发明人： Lei Zhang , Sateesh Lagudu , Allen Rush

IPC分类号： G06N3/08 , G06N3/04

摘要： Systems, apparatuses, and methods for adaptively mapping a machine learning model to a multi-core inference accelerator engine are disclosed. A computing system includes a multi-core inference accelerator engine with multiple inference cores coupled to a memory subsystem. The system also includes a control unit which determines how to adaptively map a machine learning model to the multi-core inference accelerator engine. In one implementation, the control unit selects a mapping scheme which minimizes the memory bandwidth utilization of the multi-core inference accelerator engine. In one implementation, this mapping scheme involves having one inference core of the multi-core inference accelerator engine fetch given data and broadcast the given data to other inference cores of the inference accelerator engine. Each inference core fetches second data unique to the respective inference core. The inference cores then perform computations on the first and second data in order to implement the machine learning model.

4.

发明授权
Machine learning inference engine scalability 有权

公开(公告)号：US11948073B2

公开(公告)日：2024-04-02

申请号：US16117302

申请日：2018-08-30

申请人： Advanced Micro Devices, Inc. , ATI Technologies ULC

发明人： Lei Zhang , Sateesh Lagudu , Allen Rush

IPC分类号： G06N3/08 , G06N3/04

CPC分类号： G06N3/08 , G06N3/04

摘要： Systems, apparatuses, and methods for adaptively mapping a machine learning model to a multi-core inference accelerator engine are disclosed. A computing system includes a multi-core inference accelerator engine with multiple inference cores coupled to a memory subsystem. The system also includes a control unit which determines how to adaptively map a machine learning model to the multi-core inference accelerator engine. In one implementation, the control unit selects a mapping scheme which minimizes the memory bandwidth utilization of the multi-core inference accelerator engine. In one implementation, this mapping scheme involves having one inference core of the multi-core inference accelerator engine fetch given data and broadcast the given data to other inference cores of the inference accelerator engine. Each inference core fetches second data unique to the respective inference core. The inference cores then perform computations on the first and second data in order to implement the machine learning model.

5.

发明授权
Memory bandwidth reduction techniques for low power convolutional neural network inference applications 有权

公开(公告)号：US11227214B2

公开(公告)日：2022-01-18

申请号：US15812336

申请日：2017-11-14

申请人： Advanced Micro Devices, Inc. , ATI Technologies ULC

发明人： Sateesh Lagudu , Lei Zhang , Allen Rush

IPC分类号： G06N3/08 , G06F1/3296 , G06N3/04 , G06N3/063

摘要： Systems, apparatuses, and methods for implementing memory bandwidth reduction techniques for low power convolutional neural network inference applications are disclosed. A system includes at least a processing unit and an external memory coupled to the processing unit. The system detects a request to perform a convolution operation on input data from a plurality of channels. Responsive to detecting the request, the system partitions the input data from the plurality of channels into 3D blocks so as to minimize the external memory bandwidth utilization for the convolution operation being performed. Next, the system loads a selected 3D block from external memory into internal memory and then generates convolution output data for the selected 3D block for one or more features. Then, for each feature, the system adds convolution output data together across channels prior to writing the convolution output data to the external memory.

6.

发明授权
Integrated video codec and inference engine 有权

公开(公告)号：US10582250B2

公开(公告)日：2020-03-03

申请号：US15657613

申请日：2017-07-24

申请人： Advanced Micro Devices, Inc. , ATI Technologies ULC

发明人： Lei Zhang , Sateesh Lagudu , Allen Rush , Razvan Dan-Dobre

IPC分类号： H04N21/4143 , G06F3/14 , H04N7/14 , H04N7/15 , H04N19/172 , H04N19/90 , H04N21/418 , H04N19/42 , G06T9/00

摘要： Systems, apparatuses, and methods for integrating a video codec with an inference engine are disclosed. A system is configured to implement an inference engine and a video codec while sharing at least a portion of its processing elements between the inference engine and the video codec. By sharing processing elements when combining the inference engine and the video codec, the silicon area of the combination is reduced. In one embodiment, the portion of processing elements which are shared include a motion prediction/motion estimation/MACs engine with a plurality of multiplier-accumulator (MAC) units, an internal memory, and peripherals. The peripherals include a memory interface, a direct memory access (DMA) engine, and a microprocessor. The system is configured to perform a context switch to reprogram the processing elements to switch between operating modes. The context switch can occur at a frame boundary or at a sub-frame boundary.

7.

发明申请
MEMORY BANDWIDTH REDUCTION TECHNIQUES FOR LOW POWER CONVOLUTIONAL NEURAL NETWORK INFERENCE APPLICATIONS 审中-公开

公开(公告)号：US20190147332A1

公开(公告)日：2019-05-16

申请号：US15812336

申请日：2017-11-14

申请人： Advanced Micro Devices, Inc. , ATI Technologies ULC

发明人： Sateesh Lagudu , Lei Zhang , Allen Rush

IPC分类号： G06N3/08 , G06F1/32

摘要： Systems, apparatuses, and methods for implementing memory bandwidth reduction techniques for low power convolutional neural network inference applications are disclosed. A system includes at least a processing unit and an external memory coupled to the processing unit. The system detects a request to perform a convolution operation on input data from a plurality of channels. Responsive to detecting the request, the system partitions the input data from the plurality of channels into 3D blocks so as to minimize the external memory bandwidth utilization for the convolution operation being performed. Next, the system loads a selected 3D block from external memory into internal memory and then generates convolution output data for the selected 3D block for one or more features. Then, for each feature, the system adds convolution output data together across channels prior to writing the convolution output data to the external memory.

8.

发明授权
Stream processor with low power parallel matrix multiply pipeline 有权

公开(公告)号：US12067401B2

公开(公告)日：2024-08-20

申请号：US15855637

申请日：2017-12-27

申请人： Advanced Micro Devices, Inc.

发明人： Jiasheng Chen , Yunxiao Zou , Michael J. Mantor , Allen Rush

IPC分类号： G06F9/38 , G06F7/544 , G06F9/30 , G06F17/16

CPC分类号： G06F9/3867 , G06F7/5443 , G06F9/3001 , G06F9/30036 , G06F9/30101 , G06F17/16

摘要： Systems, apparatuses, and methods for implementing a low power parallel matrix multiply pipeline are disclosed. In one embodiment, a system includes at least first and second vector register files coupled to a matrix multiply pipeline. The matrix multiply pipeline comprises a plurality of dot product units. The dot product units are configured to calculate dot or outer products for first and second sets of operands retrieved from the first vector register file. The results of the dot or outer product operations are written back to the second vector register file. The second vector register file provides the results from the previous dot or outer product operations as inputs to subsequent dot or outer product operations. The dot product units receive the results from previous phases of the matrix multiply operation and accumulate these previous dot or outer product results with the current dot or outer product results.

9.

发明申请
STREAM PROCESSOR WITH LOW POWER PARALLEL MATRIX MULTIPLY PIPELINE 审中-公开

公开(公告)号：US20190171448A1

公开(公告)日：2019-06-06

申请号：US15855637

申请日：2017-12-27

申请人： Advanced Micro Devices, Inc.

发明人： Jiasheng Chen , Yunxiao Zou , Michael J. Mantor , Allen Rush

IPC分类号： G06F9/30 , G06F7/544 , G06F17/16 , G06F9/38

摘要： Systems, apparatuses, and methods for implementing a low power parallel matrix multiply pipeline are disclosed. In one embodiment, a system includes at least first and second vector register files coupled to a matrix multiply pipeline. The matrix multiply pipeline comprises a plurality of dot product units. The dot product units are configured to calculate dot or outer products for first and second sets of operands retrieved from the first vector register file. The results of the dot or outer product operations are written back to the second vector register file. The second vector register file provides the results from the previous dot or outer product operations as inputs to subsequent dot or outer product operations. The dot product units receive the results from previous phases of the matrix multiply operation and accumulate these previous dot or outer product results with the current dot or outer product results.

搜索结果

国家/区域

专利有效性

申请日

公布(公告)日

申请人

申请人所在国/区域

发明人

IPC

IPC部

IPC大类

IPC小类

IPC大组

IPC小组

外观分类