-
公开(公告)号:US20240112297A1
公开(公告)日:2024-04-04
申请号:US17957689
申请日:2022-09-30
Applicant: Advanced Micro Devices, Inc. , ATI Technologies ULC
Inventor: Tung Chuen Kwong , Ying Liu , Akila Subramaniam
IPC: G06T1/60
CPC classification number: G06T1/60
Abstract: Methods and devices are provided for processing image data on a sub-frame portion basis using layers of a convolutional neural network. The processing device comprises memory and a processor. The processor is configured to determine, for an input tile of an image, a receptive field via backward propagation and determine a size of the input tile based on the receptive field and an amount of local memory allocated to store data for the input tile. The processor determines whether the amount of local memory allocated to store the data of the input tile and padded data for the receptive field.
-
公开(公告)号:US20240111688A1
公开(公告)日:2024-04-04
申请号:US17957742
申请日:2022-09-30
Applicant: Advanced Micro Devices, Inc. , ATI Technologies ULC
Inventor: Omar Fakhri Ahmed , Norman Vernon Douglas Stewart , Mihir Shaileshbhai Doctor , Jason Todd Arbaugh , Milind Baburao Kamble , Philip Ng , Xiaojian Liu
IPC: G06F12/109
CPC classification number: G06F12/109 , G06F2212/657
Abstract: A technique for servicing a memory request is disclosed. The technique includes obtaining permissions associated with a source and a destination specified by the memory request, obtaining a first set of address translations for the memory request, and executing operations for a first request, using the first set of address translations.
-
公开(公告)号:US20240111678A1
公开(公告)日:2024-04-04
申请号:US17958120
申请日:2022-09-30
Applicant: ADVANCED MICRO DEVICES, INC.
Inventor: JAGADISH B. KOTRA , JOHN KALAMATIANOS , PAUL MOYER , GABRIEL H. LOH
IPC: G06F12/0862 , G06F12/0811
CPC classification number: G06F12/0862 , G06F12/0811
Abstract: Systems and methods for pushed prefetching include: multiple core complexes, each core complex having multiple cores and multiple caches, the multiple caches configured in a memory hierarchy with multiple levels; an interconnect device coupling the core complexes to each other and coupling the core complexes to shared memory, the shared memory at a lower level of the memory hierarchy than the multiple caches; and a push-based prefetcher having logic to: monitor memory traffic between caches of a first level of the memory hierarchy and the shared memory; and based on the monitoring, initiate a prefetch of data to a cache of the first level of the memory hierarchy.
-
154.
公开(公告)号:US20240111676A1
公开(公告)日:2024-04-04
申请号:US17957358
申请日:2022-09-30
Applicant: Advanced Micro Devices, Inc.
Inventor: John Kalamatianos , Marko Scrbak , Gabriel H. Loh , Akhil Arunkumar
IPC: G06F12/0862
CPC classification number: G06F12/0862 , G06F2212/6028
Abstract: A disclosed computing device includes at least one prefetcher and a processing device communicatively coupled to the prefetcher. The processing device is configured to detect a throttling instruction that indicates a start of a throttling region. The computing device is further configured to prevent the prefetcher from being trained on one or more memory instructions included in the throttling region in response to the throttling instruction. Various other apparatuses, systems, and methods are also disclosed.
-
公开(公告)号:US20240111674A1
公开(公告)日:2024-04-04
申请号:US17955618
申请日:2022-09-29
Applicant: Advanced Micro Devices, Inc.
Inventor: Alok Garg , Neil N Marketkar , Matthew T. Sobel
IPC: G06F12/0811 , G06F12/0875 , G06F12/0884
CPC classification number: G06F12/0811 , G06F12/0875 , G06F12/0884
Abstract: Data reuse cache techniques are described. In one example, a load instruction is generated by an execution unit of a processor unit. In response to the load instruction, data is loaded by a load-store unit for processing by the execution unit and is also stored to a data reuse cache communicatively coupled between the load-store unit and the execution unit. Upon receipt of a subsequent load instruction for the data from the execution unit, the data is loaded from the data reuse cache for processing by the execution unit.
-
公开(公告)号:US20240111420A1
公开(公告)日:2024-04-04
申请号:US17956417
申请日:2022-09-29
Applicant: Advanced Micro Devices, Inc.
Inventor: Jagadish B. Kotra , John Kalamatianos
IPC: G06F3/06
CPC classification number: G06F3/0611 , G06F3/0653 , G06F3/0673
Abstract: Methods, devices, and systems for retrieving information based on cache miss prediction. It is predicted, based on a history of cache misses at a private cache, that a cache lookup for the information will miss a shared victim cache. A speculative memory request is enabled based on the prediction that the cache lookup for the information will miss the shared victim cache. The information is fetched based on the enabled speculative memory request.
-
公开(公告)号:US11948073B2
公开(公告)日:2024-04-02
申请号:US16117302
申请日:2018-08-30
Applicant: Advanced Micro Devices, Inc. , ATI Technologies ULC
Inventor: Lei Zhang , Sateesh Lagudu , Allen Rush
Abstract: Systems, apparatuses, and methods for adaptively mapping a machine learning model to a multi-core inference accelerator engine are disclosed. A computing system includes a multi-core inference accelerator engine with multiple inference cores coupled to a memory subsystem. The system also includes a control unit which determines how to adaptively map a machine learning model to the multi-core inference accelerator engine. In one implementation, the control unit selects a mapping scheme which minimizes the memory bandwidth utilization of the multi-core inference accelerator engine. In one implementation, this mapping scheme involves having one inference core of the multi-core inference accelerator engine fetch given data and broadcast the given data to other inference cores of the inference accelerator engine. Each inference core fetches second data unique to the respective inference core. The inference cores then perform computations on the first and second data in order to implement the machine learning model.
-
公开(公告)号:US11948000B2
公开(公告)日:2024-04-02
申请号:US17219365
申请日:2021-03-31
Applicant: Advanced Micro Devices, Inc.
Inventor: Mitchell Howard Singer , Derrick Trevor Owens
CPC classification number: G06F9/4881 , G06F9/544 , G06T1/20
Abstract: Systems, apparatuses, and methods for performing command buffer gang submission are disclosed. A system includes at least first and second processors and a memory. The first processor (e.g., CPU) generates a command buffer and stores the command buffer in the memory. A mechanism is implemented where a granularity of work provided to the second processor (e.g., GPU) is increased which, in turn, increases the opportunities for parallel work. In gang submission mode, the user-mode driver (UMD) specifies a set of multiple queues and command buffers to execute on those multiple queues, and that work is guaranteed to execute as a single unit from the GPU operating system scheduler point of view. Using gang submission, synchronization between command buffers executing on multiple queues in the same submit is safe. This opens up optimization opportunities for application use (explicit gang submission) and for internal driver use (implicit gang submission).
-
公开(公告)号:US11947833B2
公开(公告)日:2024-04-02
申请号:US17845922
申请日:2022-06-21
Applicant: Advanced Micro Devices, Inc.
Inventor: Anwar Kashem , Craig Daniel Eaton , Pouya Najafi Ashtiani , Tsun Ho Liu
IPC: G06F3/06 , G06F18/214
CPC classification number: G06F3/0656 , G06F3/0683 , G06F18/214 , G06F3/0604
Abstract: A method and apparatus for training data in a computer system includes reading data stored in a first memory address in a memory and writing it to a buffer. Training data is generated for transmission to the first memory address. The data is transmitted to the first memory address. Information relating to the training data is read from the first memory address and the stored data is read from the buffer and written to the memory area where the training data was transmitted.
-
公开(公告)号:US11947487B2
公开(公告)日:2024-04-02
申请号:US17852306
申请日:2022-06-28
Applicant: Advanced Micro Devices, Inc.
Inventor: Johnathan Robert Alsop , Karthik Ramu Sangaiah , Anthony T. Gutierrez
IPC: G06F15/82
CPC classification number: G06F15/825
Abstract: Methods and systems are disclosed for performing dataflow execution by an accelerated processing unit (APU). Techniques disclosed include decoding information from one or more dataflow instructions. The decoded information is associated with dataflow execution of a computational task. Techniques disclosed further include configuring, based on the decoded information, dataflow circuitry, and, then, executing the dataflow execution of the computational task using the dataflow circuitry.
-
-
-
-
-
-
-
-
-