-
公开(公告)号:US20220091986A1
公开(公告)日:2022-03-24
申请号:US17029976
申请日:2020-09-23
Applicant: Advanced Micro Devices, Inc.
Inventor: Jagadish B. Kotra , John Kalamatianos
IPC: G06F12/0875
Abstract: Systems, apparatuses, and methods for efficiently processing memory requests are disclosed. A computing system includes at least one processing unit coupled to a memory. Circuitry in the processing unit determines a memory request becomes a long-latency request based on detecting a translation lookaside buffer (TLB) miss, a branch misprediction, a memory dependence misprediction, or a precise exception has occurred. The circuitry marks the memory request as a long-latency request such as storing an indication of a long-latency request in an instruction tag of the memory request. The circuitry uses weighted criteria for scheduling out-of-order issue and servicing of memory requests. However, the indication of a long-latency request is not combined with other criteria in a weighted sum. Rather, the indication of the long-latency request is a separate value. The circuitry prioritizes memory requests marked as long-latency requests over memory requests not marked as long-latency requests.
-
132.
公开(公告)号:US11113065B2
公开(公告)日:2021-09-07
申请号:US16671097
申请日:2019-10-31
Applicant: Advanced Micro Devices, Inc.
Inventor: John Kalamatianos , Susumu Mashimo , Krishnan V. Ramani , Scott Thomas Bingham
Abstract: A technique for speculatively executing load-dependent instructions includes detecting that a memory ordering consistency queue is full for a completed load instruction. The technique also includes storing data loaded by the completed load instruction into a storage location for storing data when the memory ordering consistency queue is full. The technique further includes speculatively executing instructions that are dependent on the completed load instruction. The technique also includes in response to a slot becoming available in the memory ordering consistency queue, replaying the load instruction. The technique further includes in response to receiving loaded data for the replayed load instruction, testing for a data mis-speculation by comparing the loaded data for the replayed load instruction with the data loaded by the completed load instruction that is stored in the storage location.
-
公开(公告)号:US10990393B1
公开(公告)日:2021-04-27
申请号:US16658474
申请日:2019-10-21
Applicant: ADVANCED MICRO DEVICES, INC.
Inventor: John Kalamatianos , Krishnan V. Ramani , Susumu Mashimo
Abstract: Address-based filtering for load/store speculation includes maintaining a filtering table including table entries associated with ranges of addresses; in response to receiving an ordering check triggering transaction, querying the filtering table using a target address of the ordering check triggering transaction to determine if an instruction dependent upon the ordering check triggering transaction has previously been generated a physical address; and in response to determining that the filtering table lacks an indication that the instruction dependent upon the ordering check triggering transaction has previously been generated a physical address, bypassing a lookup operation in an ordering violation memory structure to determine whether the instruction dependent upon the ordering check triggering transaction is currently in-flight.
-
公开(公告)号:US10983915B2
公开(公告)日:2021-04-20
申请号:US16544468
申请日:2019-08-19
Applicant: Advanced Micro Devices, Inc.
Inventor: Alexander D. Breslow , John Kalamatianos
IPC: G06F12/0895 , H03M7/30
Abstract: Systems, apparatuses, and methods for implementing flexible dictionary sharing techniques for caches are disclosed. A set-associative cache includes a dictionary for each data array set. When a cache line is to be allocated in the cache, a cache controller determines to which set a base index of the cache line address maps. Then, a selector unit determines which dictionary of a group of dictionaries stored by those sets neighboring this set would achieve the most compression for the cache line. This dictionary is then selected to compress the cache line. An offset is added to the base index of the cache line to generate a full index in order to map the cache line to the set corresponding to this chosen dictionary. The compressed cache line is stored in this set with the chosen dictionary, and the offset is stored in the corresponding tag array entry.
-
公开(公告)号:US20210089324A1
公开(公告)日:2021-03-25
申请号:US16913146
申请日:2020-06-26
Applicant: ADVANCED MICRO DEVICES, INC.
Inventor: Greg Sadowski , John Kalamatianos , Shomit N. Das
IPC: G06F9/38
Abstract: An asynchronous pipeline includes a first stage and one or more second stages. A controller provides control signals to the first stage to indicate a modification to an operating speed of the first stage. The modification is determined based on a comparison of a completion status of the first stage to one or more completion statuses of the one or more second stages. In some cases, the controller provides control signals indicating modifications to an operating voltage applied to the first stage and a drive strength of a buffer in the first stage. Modules can be used to determine the completion statuses of the first stage and the one or more second stages based on the monitored output signals generated by the stages, output signals from replica critical paths associated with the stages, or a lookup table that indicates estimated completion times.
-
公开(公告)号:US10671535B2
公开(公告)日:2020-06-02
申请号:US13944148
申请日:2013-07-17
Applicant: Advanced Micro Devices, Inc.
Inventor: John Kalamatianos , Paul Keltcher , Marius Evers , Chitresh Narasimhaiah
IPC: G06F12/0862 , G06F12/06
Abstract: A prefetcher maintains the state of stored prefetch information, such as a prefetch confidence level, when a prefetch would cross a memory page boundary. The maintained prefetch information can be used both to identify whether the stride pattern for a particular sequence of demand requests persists after the memory page boundary has been crossed, and to continue to issue prefetch requests according to the identified pattern. The prefetcher therefore does not have re-identify a stride pattern each time a page boundary is crossed by a sequence of demand requests, thereby improving the efficiency and accuracy of the prefetcher.
-
公开(公告)号:US20200150966A1
公开(公告)日:2020-05-14
申请号:US16725203
申请日:2019-12-23
Applicant: Advanced Micro Devices, Inc.
Inventor: Varun Agrawal , John Kalamatianos , Adithya Yalavarti , Jingjie Qian
IPC: G06F9/38
Abstract: An electronic device handles accesses of a branch prediction functional block when executing instructions in program code. The electronic device includes a processor having the branch prediction functional block that provides branch prediction information for control transfer instructions (CTIs) in the program code and a minimum predictor use (MPU) functional block. The MPU functional block determines, based on a record associated with a given fetch group of instructions, that a specified number of subsequent fetch groups of instructions that were previously determined to include no CTIs or conditional CTIs that were not taken are to be fetched for execution in sequence following the given fetch group. The MPU functional block then, when each of the specified number of the subsequent fetch groups is fetched and prepared for execution, prevents corresponding accesses of the branch prediction functional block for acquiring branch prediction information for instructions in that subsequent fetch group.
-
公开(公告)号:US20200081771A1
公开(公告)日:2020-03-12
申请号:US16123489
申请日:2018-09-06
Applicant: Advanced Micro Devices, Inc.
Inventor: John Kalamatianos , Shrikanth Ganapathy
IPC: G06F11/10 , G06F12/0804 , G06F12/0815
Abstract: A computing device having a cache memory that is configured in a write-back mode is described. A cache controller in the cache memory acquires, from a record of bit errors that are present in each of a plurality of portions of the cache memory, a number of bit errors in a portion of the cache memory. The cache controller detects a coherency state of data stored in the portion of the cache memory. Based on the coherency state and the number of bit errors, the cache controller selects an error protection from among a plurality of error protections. The cache controller uses the selected error protection to protect the data stored in the portion of the cache memory from errors.
-
公开(公告)号:US20200073845A1
公开(公告)日:2020-03-05
申请号:US16118172
申请日:2018-08-30
Applicant: Advanced Micro Devices, Inc.
Inventor: Shomit N. Das , Matthew Tomei , Shrikanth Ganapathy , John Kalamatianos
Abstract: Systems, apparatuses, and methods for reliably transmitting data over voltage scaled links are disclosed. A computing system includes at least first and second devices connected via a link. In one implementation, if a data block can be compressed to less than or equal to half the original size of the data block, then the data block is compressed and sent on the link in a single clock cycle rather than two clock cycles. If the data block cannot be compressed to half the original size, but if the data block can be compressed enough to include error correction code (ECC) bits without exceeding the original size, then ECC bits are added to the compressed block which is sent on the link at a reduced voltage. The ECC bits help to correct for any errors that are generated as a result of operating the link at the reduced voltage.
-
公开(公告)号:US20180217844A1
公开(公告)日:2018-08-02
申请号:US15417555
申请日:2017-01-27
Applicant: Advanced Micro Devices, Inc. , ATI Technologies ULC
Inventor: John Kalamatianos , Greg Sadowski , Syed Zohaib M. Gilani
IPC: G06F9/38 , G06F9/30 , G06F15/80 , G06F12/0875
CPC classification number: G06F9/3836 , G06F9/3001 , G06F9/30036 , G06F9/3824 , G06F9/383 , G06F9/384 , G06F9/3851 , G06F9/3871 , G06F9/3877 , G06F9/3887 , G06F12/0875 , G06F15/8007 , G06F2212/452
Abstract: A method and apparatus of asynchronous scheduling in a graphics device includes sending one or more instructions from an instruction scheduler to one or more instruction first-in/first-out (FIFO) devices. An instruction in the one or more FIFO devices is selected for execution by a single-instruction/multiple-data (SIMD) pipeline unit. It is determined whether all operands for the selected instruction are available for execution of the instruction, and if all the operands are available, the selected instruction is executed on the SIMD pipeline unit. The self-timed arithmetic pipeline unit (SIMD pipeline unit) is effectively encapsulated in a synchronous, (e.g., clocked by global clock), scheduler and register file environment.
-
-
-
-
-
-
-
-
-