-
公开(公告)号:US11586555B2
公开(公告)日:2023-02-21
申请号:US17231957
申请日:2021-04-15
Applicant: Advanced Micro Devices, Inc.
Inventor: Alexander D. Breslow , John Kalamatianos
IPC: G06F12/0895 , H03M7/30
Abstract: Systems, apparatuses, and methods for implementing flexible dictionary sharing techniques for caches are disclosed. A set-associative cache includes a dictionary for each data array set. When a cache line is to be allocated in the cache, a cache controller determines to which set a base index of the cache line address maps. Then, a selector unit determines which dictionary of a group of dictionaries stored by those sets neighboring this set would achieve the most compression for the cache line. This dictionary is then selected to compress the cache line. An offset is added to the base index of the cache line to generate a full index in order to map the cache line to the set corresponding to this chosen dictionary. The compressed cache line is stored in this set with the chosen dictionary, and the offset is stored in the corresponding tag array entry.
-
公开(公告)号:US20230046477A1
公开(公告)日:2023-02-16
申请号:US17545108
申请日:2021-12-08
Applicant: Advanced Micro Devices, Inc.
Inventor: Ramon Mangaser , Karthik Gopalakrishnan , Andy Huei Chu , Pradeep Jayaraman
Abstract: A data transmission system includes a first circuit, a second circuit, and a reference voltage generation circuit. The first circuit includes a transmitter powered by a first power supply voltage and having an input for receiving a data output signal, and an output. The second circuit includes a receiver powered by a second power supply voltage and having a first input coupled to the output of the transmitter, a second input for receiving a reference voltage, and an output for providing a data input signal. The reference voltage generation circuit forms the reference voltage by mixing a first signal generated by the first circuit based on the first power supply voltage and a second signal generated by the second circuit based on the second power supply voltage.
-
公开(公告)号:US11579876B2
公开(公告)日:2023-02-14
申请号:US17008006
申请日:2020-08-31
Applicant: ADVANCED MICRO DEVICES, INC. , ATI TECHNOLOGIES ULC
Inventor: Anirudh R. Acharya , Alexander Fuad Ashkar , Ashkan Hosseinzadeh Namin
Abstract: A method of save-restore operations includes monitoring, by a power controller of a parallel processor (such as a graphics processing unit), of a register bus for one or more register write signals. The power controller determines that a register write signal is addressed to a state register that is designated to be saved prior to changing a power state of the parallel processor from a first state to a second state having a lower level of energy usage. The power controller instructs a copy of data corresponding to the state register to be written to a local memory module of the parallel processor. Subsequently, the parallel processor receives a power state change signal and writes state register data saved at the local memory module to an off-chip memory prior to changing the power state of the parallel processor.
-
公开(公告)号:US11573765B2
公开(公告)日:2023-02-07
申请号:US16219154
申请日:2018-12-13
Applicant: ADVANCED MICRO DEVICES, INC.
Inventor: Milind N. Nemlekar , Prerit Dak
Abstract: A processing unit implements a convolutional neural network (CNN) by fusing at least a portion of a convolution phase of the CNN with at least a portion of a batch normalization phase. The processing unit convolves two input matrices representing inputs and weights of a portion of the CNN to generate an output matrix. The processing unit performs the convolution via a series of multiplication operations, with each multiplication operation generating a corresponding submatrix (or “tile”) of the output matrix at an output register of the processing unit. While an output submatrix is stored at the output register, the processing unit performs a reduction phase and an update phase of the batch normalization phase for the CNN. The processing unit thus fuses at least a portion of the batch normalization phase of the CNN with a portion of the convolution.
-
公开(公告)号:US20230036191A1
公开(公告)日:2023-02-02
申请号:US17390479
申请日:2021-07-30
Applicant: Advanced Micro Devices, Inc. , ATI Technologies ULC
Inventor: Alexander J. Branover , Christopher T. Weaver , Benjamin Tsien , Indrani Paul , Mihir Shaileshbhai Doctor , Thomas J. Gibney , John P. Petry , Dennis Au , Oswin Hall
IPC: G06F1/3234 , G06F1/3209
Abstract: A disclosed technique includes transmitting data in a first buffer associated with a first display pipe to a first display associated with the first display pipe; transmitting data in a second buffer associated with a second display pipe to the first display; requesting wake-up of a memory; and refilling one or both of the first buffer and the second buffer from the memory.
-
公开(公告)号:US20230033583A1
公开(公告)日:2023-02-02
申请号:US17389925
申请日:2021-07-30
Applicant: Advanced Micro Devices, Inc.
Inventor: XiaoJing Ma , Ling-Ling Wang , Jin Xu , ZengRong Huang , Lina Ma , Wei Shao , LingFei Shi
Abstract: Systems, apparatuses, and methods for implementing a primary input/output (PIO) queue for host and guest operating systems (OS's) are disclosed. A system includes a PIO queue, one or more compute units, and a control unit. The PIO queue is able to store work commands for multiple different types of OS's, including host and guest OS's. The control unit is able to dispatch multiple work commands from multiple OS's to execute concurrently on the compute unit(s). This allows for execution of work commands by different OS's without the processing device(s) having to incur the latency of a world switch.
-
公开(公告)号:US20230032375A1
公开(公告)日:2023-02-02
申请号:US17390293
申请日:2021-07-30
Applicant: Advanced Micro Devices, Inc.
Inventor: Eric Busta , Michael L. Golden , Sean M. O'Mullan , James Wingfield , Keith A. Kasprak , Russell Schreiber , Michael Estlick
Abstract: An integrated circuit includes one or more processing units that execute instructions that employ a register file, control logic creates a pre-startup register free list, prior to normal operation of at least one of the processing units, that includes a list of registers devoid of undefective registers. In some implementations, no column and row repair information is provided to register file repair logic. In certain examples, the register file is configured as a repair-less register file. During normal operation of the one or more processing units, the integrated circuit employs the pre-startup register free list to select registers in a register file for the executing instructions. Associated methods are also presented.
-
公开(公告)号:US20230031595A1
公开(公告)日:2023-02-02
申请号:US17961613
申请日:2022-10-07
Applicant: ADVANCED MICRO DEVICES, INC.
Inventor: JAMES R. MAGRO , KEDARNATH BALAKRISHNAN , BRENDAN T. MANGAN
IPC: G06F13/16 , G06F9/30 , G06F12/02 , G06F12/1009
Abstract: A memory controller includes a memory channel controller that uses multiple groups of command queue and arbiter pairs. Each arbiter is coupled to a respective command queue to select memory access commands from each command queue according to predetermined criteria. Each arbiter selects from among the memory access requests in each command queue independently based on the predetermined criteria and sends selected memory access requests to a selector that serves as a second level arbiter which sends the request to a memory subchannel.
-
339.
公开(公告)号:US20230030679A1
公开(公告)日:2023-02-02
申请号:US17386115
申请日:2021-07-27
Applicant: Advanced Micro Devices, Inc.
Inventor: Jagadish B. Kotra , John Kalamatianos , Gagandeep Panwar
IPC: G06F12/02 , G06F12/0817 , G06F12/06 , G06F9/30
Abstract: A technical solution to the technical problem of how to improve dispatch throughput for memory-centric commands bypasses address checking for certain memory-centric commands. Implementations include using an Address Check Bypass (ACB) bit to specify whether address checking should be performed for a memory-centric command. ACB bit values are specified in memory-centric instructions, automatically specified by a process, such as a compiler, or by host hardware, such as dispatch hardware, based upon whether a memory-centric command explicitly references memory. Implementations include bypassing, i.e., not performing, address checking for memory-centric commands that do not access memory and also for memory-centric commands that do access memory, but that have the same physical address as a prior memory-centric command that explicitly accessed memory to ensure that any data in caches was flushed to memory and/or invalidated.
-
340.
公开(公告)号:US11567554B2
公开(公告)日:2023-01-31
申请号:US15837918
申请日:2017-12-11
Applicant: ADVANCED MICRO DEVICES, INC.
Inventor: Jay Fleischman , Michael Estlick , Michael Christopher Sedmak , Erik Swanson , Sneha V. Desai
Abstract: A pipeline includes a first portion configured to process a first subset of bits of an instruction and a second portion configured to process a second subset of the bits of the instruction. A first clock mesh is configured to provide a first clock signal to the first portion of the pipeline. A second clock mesh is configured to provide a second clock signal to the second portion of the pipeline. The first and second clock meshes selectively provide the first and second clock signals based on characteristics of in-flight instructions that have been dispatched to the pipeline but not yet retired. In some cases, a physical register file is configured to store values of bits representative of instructions. Only the first subset is stored in the physical register file in response to the value of the zero high bit indicating that the second subset is equal to zero.
-
-
-
-
-
-
-
-
-