Patent search ap:("Advanced Micro Devices Page Inc.") AND inv:"Sateesh Lagudu"

11.

发明授权
Memory bandwidth reduction techniques for low power convolutional neural network inference applications 有权

公开(公告)号：US11227214B2

公开(公告)日：2022-01-18

申请号：US15812336

申请日：2017-11-14

Applicant: Advanced Micro Devices, Inc. , ATI Technologies ULC

Inventor： Sateesh Lagudu , Lei Zhang , Allen Rush

IPC: G06N3/08 , G06F1/3296 , G06N3/04 , G06N3/063

Abstract: Systems, apparatuses, and methods for implementing memory bandwidth reduction techniques for low power convolutional neural network inference applications are disclosed. A system includes at least a processing unit and an external memory coupled to the processing unit. The system detects a request to perform a convolution operation on input data from a plurality of channels. Responsive to detecting the request, the system partitions the input data from the plurality of channels into 3D blocks so as to minimize the external memory bandwidth utilization for the convolution operation being performed. Next, the system loads a selected 3D block from external memory into internal memory and then generates convolution output data for the selected 3D block for one or more features. Then, for each feature, the system adds convolution output data together across channels prior to writing the convolution output data to the external memory.

12.

发明授权
Integrated video codec and inference engine 有权

公开(公告)号：US10582250B2

公开(公告)日：2020-03-03

申请号：US15657613

申请日：2017-07-24

Applicant: Advanced Micro Devices, Inc. , ATI Technologies ULC

Inventor： Lei Zhang , Sateesh Lagudu , Allen Rush , Razvan Dan-Dobre

IPC: H04N21/4143 , G06F3/14 , H04N7/14 , H04N7/15 , H04N19/172 , H04N19/90 , H04N21/418 , H04N19/42 , G06T9/00

Abstract: Systems, apparatuses, and methods for integrating a video codec with an inference engine are disclosed. A system is configured to implement an inference engine and a video codec while sharing at least a portion of its processing elements between the inference engine and the video codec. By sharing processing elements when combining the inference engine and the video codec, the silicon area of the combination is reduced. In one embodiment, the portion of processing elements which are shared include a motion prediction/motion estimation/MACs engine with a plurality of multiplier-accumulator (MAC) units, an internal memory, and peripherals. The peripherals include a memory interface, a direct memory access (DMA) engine, and a microprocessor. The system is configured to perform a context switch to reprogram the processing elements to switch between operating modes. The context switch can occur at a frame boundary or at a sub-frame boundary.

13.

发明申请
MEMORY BANDWIDTH REDUCTION TECHNIQUES FOR LOW POWER CONVOLUTIONAL NEURAL NETWORK INFERENCE APPLICATIONS 审中-公开

公开(公告)号：US20190147332A1

公开(公告)日：2019-05-16

申请号：US15812336

申请日：2017-11-14

Applicant: Advanced Micro Devices, Inc. , ATI Technologies ULC

Inventor： Sateesh Lagudu , Lei Zhang , Allen Rush

IPC: G06N3/08 , G06F1/32

Abstract: Systems, apparatuses, and methods for implementing memory bandwidth reduction techniques for low power convolutional neural network inference applications are disclosed. A system includes at least a processing unit and an external memory coupled to the processing unit. The system detects a request to perform a convolution operation on input data from a plurality of channels. Responsive to detecting the request, the system partitions the input data from the plurality of channels into 3D blocks so as to minimize the external memory bandwidth utilization for the convolution operation being performed. Next, the system loads a selected 3D block from external memory into internal memory and then generates convolution output data for the selected 3D block for one or more features. Then, for each feature, the system adds convolution output data together across channels prior to writing the convolution output data to the external memory.

14.

发明授权
Broadcast synchronization for dynamically adaptable arrays 有权

公开(公告)号：US11200060B1

公开(公告)日：2021-12-14

申请号：US17132002

申请日：2020-12-23

Applicant: ADVANCED MICRO DEVICES, INC.

Inventor： Sateesh Lagudu , Arun Vaidyanathan Ananthanarayan , Michael Mantor , Allen H. Rush

IPC: G06F9/30 , G06F9/32 , G06F15/80

Abstract: An array processor includes processor element arrays (PEAs) distributed in rows and columns. The PEAs are configured to perform operations on parameter values. A first sequencer received a first direct memory access (DMA) instruction that includes a request to read data from at least one address in memory. A texture address (TA) engine requests the data from the memory based on the at least one address and a texture data (TD) engine provides the data to the PEAs. The PEAs provide first synchronization signals to the TD engine to indicate availability of registers for receiving the data. The TD engine provides second synchronization signals to the first sequencer in response to receiving acknowledgments that the PEAs have consumed the data.

15.

发明申请
BROADCAST COMMAND AND RESPONSE 审中-公开

公开(公告)号：US20200089550A1

公开(公告)日：2020-03-19

申请号：US16171451

申请日：2018-10-26

Applicant: Advanced Micro Devices, Inc. , ATI Technologies ULC

Inventor： Kostantinos Danny Christidis , Lei Zhang , Sateesh Lagudu , Purushotham Niranjan Dasiga

IPC: G06F9/54 , G06N3/08

Abstract: Systems, apparatuses, and methods for implementing a broadcast read response protocol are disclosed. A computing system includes a plurality of processing engines coupled to a memory subsystem. A first processing engine executes a read and broadcast response command, wherein the read and broadcast response command targets first data at a first address in the memory subsystem. One or more other processing engines execute a wait command to wait to receive the first data requested by the first processing engine. After receiving the first data from the memory subsystem, the plurality of processing engines process the first data as part of completing a first operation. In one implementation, the first operation is implementing a given layer of a machine learning model. In one implementation, the given layer is a convolutional layer of a neural network.

16.

发明授权
Concurrent image compression and thumbnail generation 有权

公开(公告)号：US10284861B2

公开(公告)日：2019-05-07

申请号：US15414466

申请日：2017-01-24

Applicant: Advanced Micro Devices, Inc.

Inventor： Mahalakshmi Thikkireddy , Sateesh Lagudu

IPC: H04N19/176 , H04N19/136 , H04N19/182 , H04N19/423 , H04N19/625 , H04N19/80

Abstract: A first memory stores values of blocks of pixels representative of a digital image, a second memory stores partial values of destination pixels in a thumbnail image, and a third memory stores compressed images and thumbnail images. A processor retrieves values of a block of pixels from the first memory. The processor also concurrently compresses the values to generate a compressed image and modify a partial value of a destination pixel based on values of pixels in portions of the block that overlap a scaling window for the destination pixel. The processor stores the modified partial value in the second memory and stores the compressed image and the thumbnail image in the third memory.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification