APPARATUS AND METHODS FOR MANAGING PACKET TRANSFER ACROSS A MEMORY FABRIC PHYSICAL LAYER INTERFACE

    公开(公告)号:US20210165606A1

    公开(公告)日:2021-06-03

    申请号:US16701794

    申请日:2019-12-03

    Abstract: An apparatus and method for managing packet transfer between a memory fabric having a physical layer interface higher data rate than a data rate of a physical layer interface of another device, receives incoming packets from the memory fabric physical layer interface wherein at least some of the packets include different instruction types. The apparatus and method determine a packet type of the incoming packet received from the memory fabric physical layer interface and when the determined incoming packet type is of a type containing an atomic request, the method and apparatus prioritizes transfer of the incoming packet with the atomic request over other packet types of incoming packets, to memory access logic that accesses local memory within an apparatus.

    Static Random Access Memory Read Path with Latch

    公开(公告)号:US20210158855A1

    公开(公告)日:2021-05-27

    申请号:US16692714

    申请日:2019-11-22

    Abstract: A read path for reading data from a memory includes a sense amplifier having data (SAT) and data complement (SAC) output nodes and a latch. The latch includes an input tri-state inverter including first and second PMOS transistors connected between VDD and an intermediate node, and first and second NMOS transistors connected between VSS and the intermediate node. A gate connection of the first PMOS and NMOS transistors is connected to the SAT node; a gate connection of the second PMOS transistor is connected to a sense amplifier enable complement input; and a gate connection of the second NMOS transistor is connected to a sense amplifier enable input. The latch also includes an output driver with an input connected to the intermediate node and an output connected to a data output node. The latch thus has two gate delays between the SAT node and the data output node.

    DATA FLOW IN A DISTRIBUTED GRAPHICS PROCESSING UNIT ARCHITECTURE

    公开(公告)号:US20210158599A1

    公开(公告)日:2021-05-27

    申请号:US16698624

    申请日:2019-11-27

    Abstract: An apparatus includes a command buffer configured to temporarily store commands. The apparatus also includes processing units disposed at a substrate. The processing units are configured to access a plurality of copies of a command from the command buffer. The processing units include first processing units (such as fixed function hardware blocks) to perform geometry operations indicated by the command on a set of primitives. The geometry operations are performed concurrently by the first processing units. The processing units also include second processing units (such as shaders) to process mutually exclusive sets of pixels generated by rasterizing the set of primitives. The apparatus also includes a cache to temporarily store the pixels after shading by the shaders. The processing units stop or interrupt processing commands in response to detecting a synchronization point and resume processing the commands in response to all the processing units completing commands before synchronization point.

    TECHNIQUES FOR PERFORMING STORE-TO-LOAD FORWARDING

    公开(公告)号:US20210157590A1

    公开(公告)日:2021-05-27

    申请号:US16698808

    申请日:2019-11-27

    Abstract: A technique for performing store-to-load forwarding is provided. The technique includes determining a virtual address for data to be loaded for the load instruction, identifying a matching store instruction from one or more store instruction memories by comparing a virtual-address-based comparison value for the load instruction to one or more virtual-address-based comparison values of one or more store instructions, determining a physical address for the load instruction, and validating the load instruction based on a comparison between the physical address of the load instruction and a physical address of the matching store instruction.

    PATTERN-BASED CACHE BLOCK COMPRESSION

    公开(公告)号:US20210157485A1

    公开(公告)日:2021-05-27

    申请号:US17029158

    申请日:2020-09-23

    Abstract: Systems, methods, and devices for performing pattern-based cache block compression and decompression. An uncompressed cache block is input to the compressor. Byte values are identified within the uncompressed cache block. A cache block pattern is searched for in a set of cache block patterns based on the byte values. A compressed cache block is output based on the byte values and the cache block pattern. A compressed cache block is input to the decompressor. A cache block pattern is identified based on metadata of the cache block. The cache block pattern is applied to a byte dictionary of the cache block. An uncompressed cache block is output based on the cache block pattern and the byte dictionary. A subset of cache block patterns is determined from a training cache trace based on a set of compressed sizes and a target number of patterns for each size.

    Implementing a micro-operation cache with compaction

    公开(公告)号:US11016763B2

    公开(公告)日:2021-05-25

    申请号:US16297358

    申请日:2019-03-08

    Abstract: Systems, apparatuses, and methods for compacting multiple groups of micro-operations into individual cache lines of a micro-operation cache are disclosed. A processor includes at least a decode unit and a micro-operation cache. When a new group of micro-operations is decoded and ready to be written to the micro-operation cache, the micro-operation cache determines which set is targeted by the new group of micro-operations. If there is a way in this set that can store the new group without evicting any existing group already stored in the way, then the new group is stored into the way with the existing group(s) of micro-operations. Metadata is then updated to indicate that the new group of micro-operations has been written to the way. Additionally, the micro-operation cache manages eviction and replacement policy at the granularity of micro-operation groups rather than at the granularity of cache lines.

    Control of performance levels of different types of processors via a user interface

    公开(公告)号:US11016555B2

    公开(公告)日:2021-05-25

    申请号:US16052266

    申请日:2018-08-01

    Inventor: I-Ming Lin

    Abstract: An apparatus and a method for controlling power consumption associated with a computing device having first and second processors configured to perform different types of operations includes providing a user interface that allows, during normal operation of the computing device, at least one of: (i) a user selection of desired performance levels of the first and second processors relative to one another, such that higher desired performance levels of one processor correspond to lower desired performance levels of the other processor, and (ii) a user selection of a desired performance level of the first processor and a user selection of a desired performance level of the second processor, the two user selections being made independently of one another. The apparatus and method control, during normal operation of the computing device, performance levels of the processors in response to the one or more user selections of the desired performance levels.

    METHOD AND APPARATUS FOR VIRTUALIZING THE MICRO-OP CACHE

    公开(公告)号:US20210149672A1

    公开(公告)日:2021-05-20

    申请号:US17125730

    申请日:2020-12-17

    Abstract: Systems, apparatuses, and methods for virtualizing a micro-operation cache are disclosed. A processor includes at least a micro-operation cache, a conventional cache subsystem, a decode unit, and control logic. The decode unit decodes instructions into micro-operations which are then stored in the micro-operation cache. The micro-operation cache has limited capacity for storing micro-operations. When new micro-operations are decoded from pending instructions, existing micro-operations are evicted from the micro-operation cache to make room for the new micro-operations. Rather than being discarded, micro-operations evicted from the micro-operation cache are stored in the conventional cache subsystem. This prevents the original instruction from having to be decoded again on subsequent executions. When the control logic determines that micro-operations for one or more fetched instructions are stored in either the micro-operation cache or the conventional cache subsystem, the control logic causes the decode unit to transition to a reduced-power state.

Patent Agency Ranking