Patent search caee:"Advanced Micro Devices Inc." Page 16

151.

发明公开
CNN SEAMLESS TILE PROCESSING FOR LOW-POWER INFERENCE ACCELERATOR 审中-公开

公开(公告)号：US20240112297A1

公开(公告)日：2024-04-04

申请号：US17957689

申请日：2022-09-30

Applicant: Advanced Micro Devices, Inc. , ATI Technologies ULC

Inventor： Tung Chuen Kwong , Ying Liu , Akila Subramaniam

IPC: G06T1/60

CPC classification number: G06T1/60

Abstract: Methods and devices are provided for processing image data on a sub-frame portion basis using layers of a convolutional neural network. The processing device comprises memory and a processor. The processor is configured to determine, for an input tile of an image, a receptive field via backward propagation and determine a size of the input tile based on the receptive field and an amount of local memory allocated to store data for the input tile. The processor determines whether the amount of local memory allocated to store the data of the input tile and padded data for the receptive field.

152.

发明公开
MEMORY ACCESS ENGINE 审中-公开

公开(公告)号：US20240111688A1

公开(公告)日：2024-04-04

申请号：US17957742

申请日：2022-09-30

Applicant: Advanced Micro Devices, Inc. , ATI Technologies ULC

Inventor： Omar Fakhri Ahmed , Norman Vernon Douglas Stewart , Mihir Shaileshbhai Doctor , Jason Todd Arbaugh , Milind Baburao Kamble , Philip Ng , Xiaojian Liu

IPC: G06F12/109

CPC classification number: G06F12/109 , G06F2212/657

Abstract: A technique for servicing a memory request is disclosed. The technique includes obtaining permissions associated with a source and a destination specified by the memory request, obtaining a first set of address translations for the memory request, and executing operations for a first request, using the first set of address translations.

153.

发明公开
PUSHED PREFETCHING IN A MEMORY HIERARCHY 审中-公开

公开(公告)号：US20240111678A1

公开(公告)日：2024-04-04

申请号：US17958120

申请日：2022-09-30

Applicant: ADVANCED MICRO DEVICES, INC.

Inventor： JAGADISH B. KOTRA , JOHN KALAMATIANOS , PAUL MOYER , GABRIEL H. LOH

IPC: G06F12/0862 , G06F12/0811

CPC classification number: G06F12/0862 , G06F12/0811

Abstract: Systems and methods for pushed prefetching include: multiple core complexes, each core complex having multiple cores and multiple caches, the multiple caches configured in a memory hierarchy with multiple levels; an interconnect device coupling the core complexes to each other and coupling the core complexes to shared memory, the shared memory at a lower level of the memory hierarchy than the multiple caches; and a push-based prefetcher having logic to: monitor memory traffic between caches of a first level of the memory hierarchy and the shared memory; and based on the monitoring, initiate a prefetch of data to a cache of the first level of the memory hierarchy.

154.

发明公开
APPARATUS, SYSTEM, AND METHOD FOR THROTTLING PREFETCHERS TO PREVENT TRAINING ON IRREGULAR MEMORY ACCESSES 审中-公开

公开(公告)号：US20240111676A1

公开(公告)日：2024-04-04

申请号：US17957358

申请日：2022-09-30

Applicant: Advanced Micro Devices, Inc.

Inventor： John Kalamatianos , Marko Scrbak , Gabriel H. Loh , Akhil Arunkumar

IPC: G06F12/0862

CPC classification number: G06F12/0862 , G06F2212/6028

Abstract: A disclosed computing device includes at least one prefetcher and a processing device communicatively coupled to the prefetcher. The processing device is configured to detect a throttling instruction that indicates a start of a throttling region. The computing device is further configured to prevent the prefetcher from being trained on one or more memory instructions included in the throttling region in response to the throttling instruction. Various other apparatuses, systems, and methods are also disclosed.

155.

发明公开
Data Reuse Cache 审中-公开

公开(公告)号：US20240111674A1

公开(公告)日：2024-04-04

申请号：US17955618

申请日：2022-09-29

Applicant: Advanced Micro Devices, Inc.

Inventor： Alok Garg , Neil N Marketkar , Matthew T. Sobel

IPC: G06F12/0811 , G06F12/0875 , G06F12/0884

CPC classification number: G06F12/0811 , G06F12/0875 , G06F12/0884

Abstract: Data reuse cache techniques are described. In one example, a load instruction is generated by an execution unit of a processor unit. In response to the load instruction, data is loaded by a load-store unit for processing by the execution unit and is also stored to a data reuse cache communicatively coupled between the load-store unit and the execution unit. Upon receipt of a subsequent load instruction for the data from the execution unit, the data is loaded from the data reuse cache for processing by the execution unit.

156.

发明公开
SPECULATIVE DRAM REQUEST ENABLING AND DISABLING 审中-公开

公开(公告)号：US20240111420A1

公开(公告)日：2024-04-04

申请号：US17956417

申请日：2022-09-29

Applicant: Advanced Micro Devices, Inc.

Inventor： Jagadish B. Kotra , John Kalamatianos

IPC: G06F3/06

CPC classification number: G06F3/0611 , G06F3/0653 , G06F3/0673

Abstract: Methods, devices, and systems for retrieving information based on cache miss prediction. It is predicted, based on a history of cache misses at a private cache, that a cache lookup for the information will miss a shared victim cache. A speculative memory request is enabled based on the prediction that the cache lookup for the information will miss the shared victim cache. The information is fetched based on the enabled speculative memory request.

157.

发明授权
Machine learning inference engine scalability 有权

公开(公告)号：US11948073B2

公开(公告)日：2024-04-02

申请号：US16117302

申请日：2018-08-30

Applicant: Advanced Micro Devices, Inc. , ATI Technologies ULC

Inventor： Lei Zhang , Sateesh Lagudu , Allen Rush

IPC: G06N3/08 , G06N3/04

CPC classification number: G06N3/08 , G06N3/04

Abstract: Systems, apparatuses, and methods for adaptively mapping a machine learning model to a multi-core inference accelerator engine are disclosed. A computing system includes a multi-core inference accelerator engine with multiple inference cores coupled to a memory subsystem. The system also includes a control unit which determines how to adaptively map a machine learning model to the multi-core inference accelerator engine. In one implementation, the control unit selects a mapping scheme which minimizes the memory bandwidth utilization of the multi-core inference accelerator engine. In one implementation, this mapping scheme involves having one inference core of the multi-core inference accelerator engine fetch given data and broadcast the given data to other inference cores of the inference accelerator engine. Each inference core fetches second data unique to the respective inference core. The inference cores then perform computations on the first and second data in order to implement the machine learning model.

158.

发明授权
Gang scheduling for low-latency task synchronization 有权

公开(公告)号：US11948000B2

公开(公告)日：2024-04-02

申请号：US17219365

申请日：2021-03-31

Applicant: Advanced Micro Devices, Inc.

Inventor： Mitchell Howard Singer , Derrick Trevor Owens

IPC: G06F9/48 , G06F9/54 , G06T1/20

CPC classification number: G06F9/4881 , G06F9/544 , G06T1/20

Abstract: Systems, apparatuses, and methods for performing command buffer gang submission are disclosed. A system includes at least first and second processors and a memory. The first processor (e.g., CPU) generates a command buffer and stores the command buffer in the memory. A mechanism is implemented where a granularity of work provided to the second processor (e.g., GPU) is increased which, in turn, increases the opportunities for parallel work. In gang submission mode, the user-mode driver (UMD) specifies a set of multiple queues and command buffers to execute on those multiple queues, and that work is guaranteed to execute as a single unit from the GPU operating system scheduler point of view. Using gang submission, synchronization between command buffers executing on multiple queues in the same submit is safe. This opens up optimization opportunities for application use (explicit gang submission) and for internal driver use (implicit gang submission).

159.

发明授权
Method and apparatus for training memory 有权

公开(公告)号：US11947833B2

公开(公告)日：2024-04-02

申请号：US17845922

申请日：2022-06-21

Applicant: Advanced Micro Devices, Inc.

Inventor： Anwar Kashem , Craig Daniel Eaton , Pouya Najafi Ashtiani , Tsun Ho Liu

IPC: G06F3/06 , G06F18/214

CPC classification number: G06F3/0656 , G06F3/0683 , G06F18/214 , G06F3/0604

Abstract: A method and apparatus for training data in a computer system includes reading data stored in a first memory address in a memory and writing it to a buffer. Training data is generated for transmission to the first memory address. The data is transmitted to the first memory address. Information relating to the training data is read from the first memory address and the stored data is read from the buffer and written to the memory area where the training data was transmitted.

160.

发明授权
Enabling accelerated processing units to perform dataflow execution 有权

公开(公告)号：US11947487B2

公开(公告)日：2024-04-02

申请号：US17852306

申请日：2022-06-28

Applicant: Advanced Micro Devices, Inc.

Inventor： Johnathan Robert Alsop , Karthik Ramu Sangaiah , Anthony T. Gutierrez

IPC: G06F15/82

CPC classification number: G06F15/825

Abstract: Methods and systems are disclosed for performing dataflow execution by an accelerated processing unit (APU). Techniques disclosed include decoding information from one or more dataflow instructions. The decoded information is associated with dataflow execution of a computational task. Techniques disclosed further include configuring, based on the decoded information, dataflow circuitry, and, then, executing the dataflow execution of the computational task using the dataflow circuitry.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification