METHOD AND APPARATUS FOR REDUCING THE LATENCY OF LONG LATENCY MEMORY REQUESTS

    公开(公告)号:US20220091986A1

    公开(公告)日:2022-03-24

    申请号:US17029976

    申请日:2020-09-23

    Abstract: Systems, apparatuses, and methods for efficiently processing memory requests are disclosed. A computing system includes at least one processing unit coupled to a memory. Circuitry in the processing unit determines a memory request becomes a long-latency request based on detecting a translation lookaside buffer (TLB) miss, a branch misprediction, a memory dependence misprediction, or a precise exception has occurred. The circuitry marks the memory request as a long-latency request such as storing an indication of a long-latency request in an instruction tag of the memory request. The circuitry uses weighted criteria for scheduling out-of-order issue and servicing of memory requests. However, the indication of a long-latency request is not combined with other criteria in a weighted sum. Rather, the indication of the long-latency request is a separate value. The circuitry prioritizes memory requests marked as long-latency requests over memory requests not marked as long-latency requests.

    Speculative instruction wakeup to tolerate draining delay of memory ordering violation check buffers

    公开(公告)号:US11113065B2

    公开(公告)日:2021-09-07

    申请号:US16671097

    申请日:2019-10-31

    Abstract: A technique for speculatively executing load-dependent instructions includes detecting that a memory ordering consistency queue is full for a completed load instruction. The technique also includes storing data loaded by the completed load instruction into a storage location for storing data when the memory ordering consistency queue is full. The technique further includes speculatively executing instructions that are dependent on the completed load instruction. The technique also includes in response to a slot becoming available in the memory ordering consistency queue, replaying the load instruction. The technique further includes in response to receiving loaded data for the replayed load instruction, testing for a data mis-speculation by comparing the loaded data for the replayed load instruction with the data loaded by the completed load instruction that is stored in the storage location.

    Address-based filtering for load/store speculation

    公开(公告)号:US10990393B1

    公开(公告)日:2021-04-27

    申请号:US16658474

    申请日:2019-10-21

    Abstract: Address-based filtering for load/store speculation includes maintaining a filtering table including table entries associated with ranges of addresses; in response to receiving an ordering check triggering transaction, querying the filtering table using a target address of the ordering check triggering transaction to determine if an instruction dependent upon the ordering check triggering transaction has previously been generated a physical address; and in response to determining that the filtering table lacks an indication that the instruction dependent upon the ordering check triggering transaction has previously been generated a physical address, bypassing a lookup operation in an ordering violation memory structure to determine whether the instruction dependent upon the ordering check triggering transaction is currently in-flight.

    Flexible dictionary sharing for compressed caches

    公开(公告)号:US10983915B2

    公开(公告)日:2021-04-20

    申请号:US16544468

    申请日:2019-08-19

    Abstract: Systems, apparatuses, and methods for implementing flexible dictionary sharing techniques for caches are disclosed. A set-associative cache includes a dictionary for each data array set. When a cache line is to be allocated in the cache, a cache controller determines to which set a base index of the cache line address maps. Then, a selector unit determines which dictionary of a group of dictionaries stored by those sets neighboring this set would achieve the most compression for the cache line. This dictionary is then selected to compress the cache line. An offset is added to the base index of the cache line to generate a full index in order to map the cache line to the set corresponding to this chosen dictionary. The compressed cache line is stored in this set with the chosen dictionary, and the offset is stored in the corresponding tag array entry.

    CONTROLLING THE OPERATING SPEED OF STAGES OF AN ASYNCHRONOUS PIPELINE

    公开(公告)号:US20210089324A1

    公开(公告)日:2021-03-25

    申请号:US16913146

    申请日:2020-06-26

    Abstract: An asynchronous pipeline includes a first stage and one or more second stages. A controller provides control signals to the first stage to indicate a modification to an operating speed of the first stage. The modification is determined based on a comparison of a completion status of the first stage to one or more completion statuses of the one or more second stages. In some cases, the controller provides control signals indicating modifications to an operating voltage applied to the first stage and a drive strength of a buffer in the first stage. Modules can be used to determine the completion statuses of the first stage and the one or more second stages based on the monitored output signals generated by the stages, output signals from replica critical paths associated with the stages, or a lookup table that indicates estimated completion times.

    Stride prefetching across memory pages
    136.
    发明授权

    公开(公告)号:US10671535B2

    公开(公告)日:2020-06-02

    申请号:US13944148

    申请日:2013-07-17

    Abstract: A prefetcher maintains the state of stored prefetch information, such as a prefetch confidence level, when a prefetch would cross a memory page boundary. The maintained prefetch information can be used both to identify whether the stride pattern for a particular sequence of demand requests persists after the memory page boundary has been crossed, and to continue to issue prefetch requests according to the identified pattern. The prefetcher therefore does not have re-identify a stride pattern each time a page boundary is crossed by a sequence of demand requests, thereby improving the efficiency and accuracy of the prefetcher.

    Controlling Accesses to a Branch Prediction Unit for Sequences of Fetch Groups

    公开(公告)号:US20200150966A1

    公开(公告)日:2020-05-14

    申请号:US16725203

    申请日:2019-12-23

    Abstract: An electronic device handles accesses of a branch prediction functional block when executing instructions in program code. The electronic device includes a processor having the branch prediction functional block that provides branch prediction information for control transfer instructions (CTIs) in the program code and a minimum predictor use (MPU) functional block. The MPU functional block determines, based on a record associated with a given fetch group of instructions, that a specified number of subsequent fetch groups of instructions that were previously determined to include no CTIs or conditional CTIs that were not taken are to be fetched for execution in sequence following the given fetch group. The MPU functional block then, when each of the specified number of the subsequent fetch groups is fetched and prepared for execution, prevents corresponding accesses of the branch prediction functional block for acquiring branch prediction information for instructions in that subsequent fetch group.

    Bit Error Protection in Cache Memories
    138.
    发明申请

    公开(公告)号:US20200081771A1

    公开(公告)日:2020-03-12

    申请号:US16123489

    申请日:2018-09-06

    Abstract: A computing device having a cache memory that is configured in a write-back mode is described. A cache controller in the cache memory acquires, from a record of bit errors that are present in each of a plurality of portions of the cache memory, a number of bit errors in a portion of the cache memory. The cache controller detects a coherency state of data stored in the portion of the cache memory. Based on the coherency state and the number of bit errors, the cache controller selects an error protection from among a plurality of error protections. The cache controller uses the selected error protection to protect the data stored in the portion of the cache memory from errors.

    RELIABLE VOLTAGE SCALED LINKS FOR COMPRESSED DATA

    公开(公告)号:US20200073845A1

    公开(公告)日:2020-03-05

    申请号:US16118172

    申请日:2018-08-30

    Abstract: Systems, apparatuses, and methods for reliably transmitting data over voltage scaled links are disclosed. A computing system includes at least first and second devices connected via a link. In one implementation, if a data block can be compressed to less than or equal to half the original size of the data block, then the data block is compressed and sent on the link in a single clock cycle rather than two clock cycles. If the data block cannot be compressed to half the original size, but if the data block can be compressed enough to include error correction code (ECC) bits without exceeding the original size, then ECC bits are added to the compressed block which is sent on the link at a reduced voltage. The ECC bits help to correct for any errors that are generated as a result of operating the link at the reduced voltage.

Patent Agency Ranking