-
公开(公告)号:US11016763B2
公开(公告)日:2021-05-25
申请号:US16297358
申请日:2019-03-08
Applicant: Advanced Micro Devices, Inc.
Inventor: Jagadish B. Kotra , John Kalamatianos
IPC: G06F9/38 , G06F9/22 , G06F12/0875 , G06F9/30
Abstract: Systems, apparatuses, and methods for compacting multiple groups of micro-operations into individual cache lines of a micro-operation cache are disclosed. A processor includes at least a decode unit and a micro-operation cache. When a new group of micro-operations is decoded and ready to be written to the micro-operation cache, the micro-operation cache determines which set is targeted by the new group of micro-operations. If there is a way in this set that can store the new group without evicting any existing group already stored in the way, then the new group is stored into the way with the existing group(s) of micro-operations. Metadata is then updated to indicate that the new group of micro-operations has been written to the way. Additionally, the micro-operation cache manages eviction and replacement policy at the granularity of micro-operation groups rather than at the granularity of cache lines.
-
公开(公告)号:US20210149672A1
公开(公告)日:2021-05-20
申请号:US17125730
申请日:2020-12-17
Applicant: Advanced Micro Devices, Inc.
Inventor: John Kalamatianos , Jagadish B. Kotra
IPC: G06F9/38 , G06F12/0897 , G06F12/0875 , G06F9/30
Abstract: Systems, apparatuses, and methods for virtualizing a micro-operation cache are disclosed. A processor includes at least a micro-operation cache, a conventional cache subsystem, a decode unit, and control logic. The decode unit decodes instructions into micro-operations which are then stored in the micro-operation cache. The micro-operation cache has limited capacity for storing micro-operations. When new micro-operations are decoded from pending instructions, existing micro-operations are evicted from the micro-operation cache to make room for the new micro-operations. Rather than being discarded, micro-operations evicted from the micro-operation cache are stored in the conventional cache subsystem. This prevents the original instruction from having to be decoded again on subsequent executions. When the control logic determines that micro-operations for one or more fetched instructions are stored in either the micro-operation cache or the conventional cache subsystem, the control logic causes the decode unit to transition to a reduced-power state.
-
公开(公告)号:US20210141740A1
公开(公告)日:2021-05-13
申请号:US16683142
申请日:2019-11-13
Applicant: Advanced Micro Devices, Inc.
Inventor: John Kalamatianos , Apostolos Kokolis , Shrikanth Ganapathy
IPC: G06F12/126 , G06F12/1027 , G06F12/0804
Abstract: A technique for accessing a memory having a high latency portion and a low latency portion is provided. The technique includes detecting a promotion trigger to promote data from the high latency portion to the low latency portion, in response to the promotion trigger, copying cache lines associated with the promotion trigger from the high latency portion to the low latency portion, and in response to a read request, providing data from either or both of the high latency portion or the low latency portion, based on a state associated with data in the high latency portion and the low latency portion.
-
公开(公告)号:US20210050864A1
公开(公告)日:2021-02-18
申请号:US16542872
申请日:2019-08-16
Applicant: Advanced Micro Devices, Inc.
Inventor: Alexander D. Breslow , Nuwan Jayasena , John Kalamatianos
Abstract: A data processing platform, method, and program product perform compression and decompression of a set of data items. Suffix data and a prefix are selected for each respective data item in the set of data items based on data content of the respective data item. The set of data items is sorted based on the prefixes. The prefixes are encoded by querying multiple encoding tables to create a code word containing compressed information representing values of all prefixes for the set of data items. The code word and suffix data for each of the data items are stored in memory. The code word is decompressed to recover the prefixes. The recovered prefixes are paired with their respective suffix data.
-
105.
公开(公告)号:US10884940B2
公开(公告)日:2021-01-05
申请号:US16230618
申请日:2018-12-21
Applicant: Advanced Micro Devices, Inc.
Inventor: John Kalamatianos , Shrikanth Ganapathy , Shomit Das , Matthew Tomei
IPC: G06F12/0893 , G06F11/07
Abstract: A method of operating a cache in a computing device includes, in response to receiving a memory access request at the cache, determining compressibility of data specified by the request, selecting in the cache a destination portion for storing the data based on the compressibility of the data and a persistent fault history of the destination portion, and storing a compressed copy of the data in a non-faulted subportion of the destination portion, wherein the persistent fault history indicates that the non-faulted subportion excludes any persistent faults.
-
106.
公开(公告)号:US20200319889A1
公开(公告)日:2020-10-08
申请号:US16671097
申请日:2019-10-31
Applicant: Advanced Micro Devices, Inc.
Inventor: John Kalamatianos , Susumu Mashimo , Krishnan V. Ramani , Scott Thomas Bingham
Abstract: A technique for speculatively executing load-dependent instructions includes detecting that a memory ordering consistency queue is full for a completed load instruction. The technique also includes storing data loaded by the completed load instruction into a storage location for storing data when the memory ordering consistency queue is full. The technique further includes speculatively executing instructions that are dependent on the completed load instruction. The technique also includes in response to a slot becoming available in the memory ordering consistency queue, replaying the load instruction. The technique further includes in response to receiving loaded data for the replayed load instruction, testing for a data mis-speculation by comparing the loaded data for the replayed load instruction with the data loaded by the completed load instruction that is stored in the storage location.
-
公开(公告)号:US20200285466A1
公开(公告)日:2020-09-10
申请号:US16297358
申请日:2019-03-08
Applicant: Advanced Micro Devices, Inc.
Inventor: Jagadish B. Kotra , John Kalamatianos
IPC: G06F9/22 , G06F9/30 , G06F12/0875
Abstract: Systems, apparatuses, and methods for compacting multiple groups of micro-operations into individual cache lines of a micro-operation cache are disclosed. A processor includes at least a decode unit and a micro-operation cache. When a new group of micro-operations is decoded and ready to be written to the micro-operation cache, the micro-operation cache determines which set is targeted by the new group of micro-operations. If there is a way in this set that can store the new group without evicting any existing group already stored in the way, then the new group is stored into the way with the existing group(s) of micro-operations. Metadata is then updated to indicate that the new group of micro-operations has been written to the way. Additionally, the micro-operation cache manages eviction and replacement policy at the granularity of micro-operation groups rather than at the granularity of cache lines.
-
公开(公告)号:US20200151100A1
公开(公告)日:2020-05-14
申请号:US16190111
申请日:2018-11-13
Applicant: Advanced Micro Devices, Inc.
Inventor: Susumu Mashimo , John Kalamatianos
IPC: G06F12/0862
Abstract: A method of prefetching target data includes, in response to detecting a lock-prefixed instruction for execution in a processor, determining a predicted target memory location for the lock-prefixed instruction based on control flow information associating the lock-prefixed instruction with the predicted target memory location. Target data is prefetched from the predicted target memory location to a cache coupled with the processor, and after completion of the prefetching, the lock-prefixed instruction is executed in the processor using the prefetched target data.
-
公开(公告)号:US20190332420A1
公开(公告)日:2019-10-31
申请号:US15965231
申请日:2018-04-27
Applicant: Advanced Micro Devices, Inc.
Inventor: Yash Sanjeev Ukidave , John Kalamatianos , Bradford Michael Beckmann
Abstract: Systems, apparatuses, and methods for performing split-workgroup dispatch to multiple compute units are disclosed. A system includes at least a plurality of compute units, control logic, and a dispatch unit. The control logic monitors resource contention among the plurality of compute units and calculates a load-rating for each compute unit based on the resource contention. The dispatch unit receives workgroups for dispatch and determines how to dispatch workgroups to the plurality of compute units based on the calculated load-ratings. If a workgroup is unable to fit in a single compute unit based on the currently available resources of the compute units, the dispatch unit divides the workgroup into its individual wavefronts and dispatches wavefronts of the workgroup to different compute units. The dispatch unit determines how to dispatch the wavefronts to specific ones of the compute units based on the calculated load-ratings.
-
公开(公告)号:US20190286209A1
公开(公告)日:2019-09-19
申请号:US15923153
申请日:2018-03-16
Applicant: Advanced Micro Devices, Inc.
Inventor: Shijia Wei , Joseph L. Greathouse , John Kalamatianos
IPC: G06F1/32
Abstract: A processor utilizes instruction based sampling to generate sampling data sampled on a per instruction basis during execution of an instruction. The sampling data indicates what processor hardware was used due to the execution of the instruction. Software receives the sampling data and generates an estimate of energy used by the instruction based on the sampling data. The sampling data may include microarchitectural events and the energy estimate utilizes a base energy amount corresponding to the instruction executed along with energy amounts corresponding to the microarchitectural events in the sampling data. The sampling data may include switching events associated with hardware blocks that switched due to execution of the instruction and the energy estimate for the instruction is based on the switching events and capacitance estimates associated with the hardware blocks.
-
-
-
-
-
-
-
-
-