-
511.
公开(公告)号:US20210165606A1
公开(公告)日:2021-06-03
申请号:US16701794
申请日:2019-12-03
Applicant: Advanced Micro Devices, Inc.
Inventor: Sergey Blagodurov
Abstract: An apparatus and method for managing packet transfer between a memory fabric having a physical layer interface higher data rate than a data rate of a physical layer interface of another device, receives incoming packets from the memory fabric physical layer interface wherein at least some of the packets include different instruction types. The apparatus and method determine a packet type of the incoming packet received from the memory fabric physical layer interface and when the determined incoming packet type is of a type containing an atomic request, the method and apparatus prioritizes transfer of the incoming packet with the atomic request over other packet types of incoming packets, to memory access logic that accesses local memory within an apparatus.
-
512.
公开(公告)号:US11023241B2
公开(公告)日:2021-06-01
申请号:US16106515
申请日:2018-08-21
Applicant: Advanced Micro Devices, Inc.
Inventor: Andrej Kocev , Jay Fleischman , Kai Troester , Johnny C. Chu , Tim J. Wilkens , Neil Marketkar , Michael W. Long
Abstract: Systems and methods selectively bypass address-generation hardware in processor instruction pipelines. In an embodiment, a processor includes an address-generation stage and an address-generation-bypass-determination unit (ABDU). The ABDU receives a load/store instruction. If an effective address for the load/store instruction is not known at the ABDU, the ABDU routes the load/store instruction via the address-generation stage of the processor. If, however, the effective address of the load/store instruction is known at the ABDU, the ABDU routes the load/store instruction to bypass the address-generation stage of the processor.
-
公开(公告)号:US20210158855A1
公开(公告)日:2021-05-27
申请号:US16692714
申请日:2019-11-22
Applicant: Advanced Micro Devices, Inc.
Inventor: Arijit Banerjee , Russell Schreiber , Kyle Whittle
IPC: G11C11/4091 , G11C11/419 , G11C11/4074 , G11C7/10
Abstract: A read path for reading data from a memory includes a sense amplifier having data (SAT) and data complement (SAC) output nodes and a latch. The latch includes an input tri-state inverter including first and second PMOS transistors connected between VDD and an intermediate node, and first and second NMOS transistors connected between VSS and the intermediate node. A gate connection of the first PMOS and NMOS transistors is connected to the SAT node; a gate connection of the second PMOS transistor is connected to a sense amplifier enable complement input; and a gate connection of the second NMOS transistor is connected to a sense amplifier enable input. The latch also includes an output driver with an input connected to the intermediate node and an output connected to a data output node. The latch thus has two gate delays between the SAT node and the data output node.
-
公开(公告)号:US20210158599A1
公开(公告)日:2021-05-27
申请号:US16698624
申请日:2019-11-27
Applicant: ADVANCED MICRO DEVICES, INC.
Inventor: Skyler J. SALEH , Ruijin WU
Abstract: An apparatus includes a command buffer configured to temporarily store commands. The apparatus also includes processing units disposed at a substrate. The processing units are configured to access a plurality of copies of a command from the command buffer. The processing units include first processing units (such as fixed function hardware blocks) to perform geometry operations indicated by the command on a set of primitives. The geometry operations are performed concurrently by the first processing units. The processing units also include second processing units (such as shaders) to process mutually exclusive sets of pixels generated by rasterizing the set of primitives. The apparatus also includes a cache to temporarily store the pixels after shading by the shaders. The processing units stop or interrupt processing commands in response to detecting a synchronization point and resume processing the commands in response to all the processing units completing commands before synchronization point.
-
公开(公告)号:US20210158222A1
公开(公告)日:2021-05-27
申请号:US16694926
申请日:2019-11-25
Applicant: Advanced Micro Devices, Inc.
Inventor: Nicholas Malaya
Abstract: Methods, devices, and systems for emulating a compute kernel with an ANN. The compute kernel is executed on a processor, and it is determined whether the compute kernel is a hotspot kernel. If the compute kernel is a hotspot kernel, the compute kernel is emulated with an ANN, and the ANN is substituted for the compute kernel.
-
公开(公告)号:US20210157590A1
公开(公告)日:2021-05-27
申请号:US16698808
申请日:2019-11-27
Applicant: Advanced Micro Devices, Inc.
Inventor: John M. King , Matthew T. Sobel
IPC: G06F9/30 , G06F12/0811 , G06F9/54
Abstract: A technique for performing store-to-load forwarding is provided. The technique includes determining a virtual address for data to be loaded for the load instruction, identifying a matching store instruction from one or more store instruction memories by comparing a virtual-address-based comparison value for the load instruction to one or more virtual-address-based comparison values of one or more store instructions, determining a physical address for the load instruction, and validating the load instruction based on a comparison between the physical address of the load instruction and a physical address of the matching store instruction.
-
公开(公告)号:US20210157485A1
公开(公告)日:2021-05-27
申请号:US17029158
申请日:2020-09-23
Applicant: Advanced Micro Devices, Inc.
Inventor: Matthew Tomei , Shomit N. Das , David A. Wood
IPC: G06F3/06 , G06F12/0802
Abstract: Systems, methods, and devices for performing pattern-based cache block compression and decompression. An uncompressed cache block is input to the compressor. Byte values are identified within the uncompressed cache block. A cache block pattern is searched for in a set of cache block patterns based on the byte values. A compressed cache block is output based on the byte values and the cache block pattern. A compressed cache block is input to the decompressor. A cache block pattern is identified based on metadata of the cache block. The cache block pattern is applied to a byte dictionary of the cache block. An uncompressed cache block is output based on the cache block pattern and the byte dictionary. A subset of cache block patterns is determined from a training cache trace based on a set of compressed sizes and a target number of patterns for each size.
-
公开(公告)号:US11016763B2
公开(公告)日:2021-05-25
申请号:US16297358
申请日:2019-03-08
Applicant: Advanced Micro Devices, Inc.
Inventor: Jagadish B. Kotra , John Kalamatianos
IPC: G06F9/38 , G06F9/22 , G06F12/0875 , G06F9/30
Abstract: Systems, apparatuses, and methods for compacting multiple groups of micro-operations into individual cache lines of a micro-operation cache are disclosed. A processor includes at least a decode unit and a micro-operation cache. When a new group of micro-operations is decoded and ready to be written to the micro-operation cache, the micro-operation cache determines which set is targeted by the new group of micro-operations. If there is a way in this set that can store the new group without evicting any existing group already stored in the way, then the new group is stored into the way with the existing group(s) of micro-operations. Metadata is then updated to indicate that the new group of micro-operations has been written to the way. Additionally, the micro-operation cache manages eviction and replacement policy at the granularity of micro-operation groups rather than at the granularity of cache lines.
-
公开(公告)号:US11016555B2
公开(公告)日:2021-05-25
申请号:US16052266
申请日:2018-08-01
Applicant: Advanced Micro Devices, Inc.
Inventor: I-Ming Lin
IPC: G06F1/3203 , G06F1/3293 , G06F1/3234 , G06F1/20 , G06F1/26 , G06F3/0484
Abstract: An apparatus and a method for controlling power consumption associated with a computing device having first and second processors configured to perform different types of operations includes providing a user interface that allows, during normal operation of the computing device, at least one of: (i) a user selection of desired performance levels of the first and second processors relative to one another, such that higher desired performance levels of one processor correspond to lower desired performance levels of the other processor, and (ii) a user selection of a desired performance level of the first processor and a user selection of a desired performance level of the second processor, the two user selections being made independently of one another. The apparatus and method control, during normal operation of the computing device, performance levels of the processors in response to the one or more user selections of the desired performance levels.
-
公开(公告)号:US20210149672A1
公开(公告)日:2021-05-20
申请号:US17125730
申请日:2020-12-17
Applicant: Advanced Micro Devices, Inc.
Inventor: John Kalamatianos , Jagadish B. Kotra
IPC: G06F9/38 , G06F12/0897 , G06F12/0875 , G06F9/30
Abstract: Systems, apparatuses, and methods for virtualizing a micro-operation cache are disclosed. A processor includes at least a micro-operation cache, a conventional cache subsystem, a decode unit, and control logic. The decode unit decodes instructions into micro-operations which are then stored in the micro-operation cache. The micro-operation cache has limited capacity for storing micro-operations. When new micro-operations are decoded from pending instructions, existing micro-operations are evicted from the micro-operation cache to make room for the new micro-operations. Rather than being discarded, micro-operations evicted from the micro-operation cache are stored in the conventional cache subsystem. This prevents the original instruction from having to be decoded again on subsequent executions. When the control logic determines that micro-operations for one or more fetched instructions are stored in either the micro-operation cache or the conventional cache subsystem, the control logic causes the decode unit to transition to a reduced-power state.
-
-
-
-
-
-
-
-
-