Implementing a micro-operation cache with compaction

    公开(公告)号:US11016763B2

    公开(公告)日:2021-05-25

    申请号:US16297358

    申请日:2019-03-08

    Abstract: Systems, apparatuses, and methods for compacting multiple groups of micro-operations into individual cache lines of a micro-operation cache are disclosed. A processor includes at least a decode unit and a micro-operation cache. When a new group of micro-operations is decoded and ready to be written to the micro-operation cache, the micro-operation cache determines which set is targeted by the new group of micro-operations. If there is a way in this set that can store the new group without evicting any existing group already stored in the way, then the new group is stored into the way with the existing group(s) of micro-operations. Metadata is then updated to indicate that the new group of micro-operations has been written to the way. Additionally, the micro-operation cache manages eviction and replacement policy at the granularity of micro-operation groups rather than at the granularity of cache lines.

    METHOD AND APPARATUS FOR VIRTUALIZING THE MICRO-OP CACHE

    公开(公告)号:US20210149672A1

    公开(公告)日:2021-05-20

    申请号:US17125730

    申请日:2020-12-17

    Abstract: Systems, apparatuses, and methods for virtualizing a micro-operation cache are disclosed. A processor includes at least a micro-operation cache, a conventional cache subsystem, a decode unit, and control logic. The decode unit decodes instructions into micro-operations which are then stored in the micro-operation cache. The micro-operation cache has limited capacity for storing micro-operations. When new micro-operations are decoded from pending instructions, existing micro-operations are evicted from the micro-operation cache to make room for the new micro-operations. Rather than being discarded, micro-operations evicted from the micro-operation cache are stored in the conventional cache subsystem. This prevents the original instruction from having to be decoded again on subsequent executions. When the control logic determines that micro-operations for one or more fetched instructions are stored in either the micro-operation cache or the conventional cache subsystem, the control logic causes the decode unit to transition to a reduced-power state.

    LATENCY HIDING FOR CACHES
    103.
    发明申请

    公开(公告)号:US20210141740A1

    公开(公告)日:2021-05-13

    申请号:US16683142

    申请日:2019-11-13

    Abstract: A technique for accessing a memory having a high latency portion and a low latency portion is provided. The technique includes detecting a promotion trigger to promote data from the high latency portion to the low latency portion, in response to the promotion trigger, copying cache lines associated with the promotion trigger from the high latency portion to the low latency portion, and in response to a read request, providing data from either or both of the high latency portion or the low latency portion, based on a state associated with data in the high latency portion and the low latency portion.

    SEMI-SORTING COMPRESSION WITH ENCODING AND DECODING TABLES

    公开(公告)号:US20210050864A1

    公开(公告)日:2021-02-18

    申请号:US16542872

    申请日:2019-08-16

    Abstract: A data processing platform, method, and program product perform compression and decompression of a set of data items. Suffix data and a prefix are selected for each respective data item in the set of data items based on data content of the respective data item. The set of data items is sorted based on the prefixes. The prefixes are encoded by querying multiple encoding tables to create a code word containing compressed information representing values of all prefixes for the set of data items. The code word and suffix data for each of the data items are stored in memory. The code word is decompressed to recover the prefixes. The recovered prefixes are paired with their respective suffix data.

    SPECULATIVE INSTRUCTION WAKEUP TO TOLERATE DRAINING DELAY OF MEMORY ORDERING VIOLATION CHECK BUFFERS

    公开(公告)号:US20200319889A1

    公开(公告)日:2020-10-08

    申请号:US16671097

    申请日:2019-10-31

    Abstract: A technique for speculatively executing load-dependent instructions includes detecting that a memory ordering consistency queue is full for a completed load instruction. The technique also includes storing data loaded by the completed load instruction into a storage location for storing data when the memory ordering consistency queue is full. The technique further includes speculatively executing instructions that are dependent on the completed load instruction. The technique also includes in response to a slot becoming available in the memory ordering consistency queue, replaying the load instruction. The technique further includes in response to receiving loaded data for the replayed load instruction, testing for a data mis-speculation by comparing the loaded data for the replayed load instruction with the data loaded by the completed load instruction that is stored in the storage location.

    IMPLEMENTING A MICRO-OPERATION CACHE WITH COMPACTION

    公开(公告)号:US20200285466A1

    公开(公告)日:2020-09-10

    申请号:US16297358

    申请日:2019-03-08

    Abstract: Systems, apparatuses, and methods for compacting multiple groups of micro-operations into individual cache lines of a micro-operation cache are disclosed. A processor includes at least a decode unit and a micro-operation cache. When a new group of micro-operations is decoded and ready to be written to the micro-operation cache, the micro-operation cache determines which set is targeted by the new group of micro-operations. If there is a way in this set that can store the new group without evicting any existing group already stored in the way, then the new group is stored into the way with the existing group(s) of micro-operations. Metadata is then updated to indicate that the new group of micro-operations has been written to the way. Additionally, the micro-operation cache manages eviction and replacement policy at the granularity of micro-operation groups rather than at the granularity of cache lines.

    CONTROL FLOW GUIDED LOCK ADDRESS PREFETCH AND FILTERING

    公开(公告)号:US20200151100A1

    公开(公告)日:2020-05-14

    申请号:US16190111

    申请日:2018-11-13

    Abstract: A method of prefetching target data includes, in response to detecting a lock-prefixed instruction for execution in a processor, determining a predicted target memory location for the lock-prefixed instruction based on control flow information associating the lock-prefixed instruction with the predicted target memory location. Target data is prefetched from the predicted target memory location to a cache coupled with the processor, and after completion of the prefetching, the lock-prefixed instruction is executed in the processor using the prefetched target data.

    FEEDBACK GUIDED SPLIT WORKGROUP DISPATCH FOR GPUS

    公开(公告)号:US20190332420A1

    公开(公告)日:2019-10-31

    申请号:US15965231

    申请日:2018-04-27

    Abstract: Systems, apparatuses, and methods for performing split-workgroup dispatch to multiple compute units are disclosed. A system includes at least a plurality of compute units, control logic, and a dispatch unit. The control logic monitors resource contention among the plurality of compute units and calculates a load-rating for each compute unit based on the resource contention. The dispatch unit receives workgroups for dispatch and determines how to dispatch workgroups to the plurality of compute units based on the calculated load-ratings. If a workgroup is unable to fit in a single compute unit based on the currently available resources of the compute units, the dispatch unit divides the workgroup into its individual wavefronts and dispatches wavefronts of the workgroup to different compute units. The dispatch unit determines how to dispatch the wavefronts to specific ones of the compute units based on the calculated load-ratings.

    PER-INSTRUCTION ENERGY DEBUGGING USING INSTRUCTION SAMPLING HARDWARE

    公开(公告)号:US20190286209A1

    公开(公告)日:2019-09-19

    申请号:US15923153

    申请日:2018-03-16

    Abstract: A processor utilizes instruction based sampling to generate sampling data sampled on a per instruction basis during execution of an instruction. The sampling data indicates what processor hardware was used due to the execution of the instruction. Software receives the sampling data and generates an estimate of energy used by the instruction based on the sampling data. The sampling data may include microarchitectural events and the energy estimate utilizes a base energy amount corresponding to the instruction executed along with energy amounts corresponding to the microarchitectural events in the sampling data. The sampling data may include switching events associated with hardware blocks that switched due to execution of the instruction and the energy estimate for the instruction is based on the switching events and capacitance estimates associated with the hardware blocks.

Patent Agency Ranking