Register renaming in block-based instruction set architecture

    公开(公告)号:US09946549B2

    公开(公告)日:2018-04-17

    申请号:US14639085

    申请日:2015-03-04

    CPC classification number: G06F9/384 G06F9/30145 G06F9/3836

    Abstract: An apparatus for mapping an architectural register to a physical register can include a memory and control circuitry. The memory can be configured to store an intra-core register rename map and an inter-core register rename map. The intra-core register rename map can be configured to map the architectural register to the physical register of a core of a multi-core processor. The inter-core register rename map can be configured to relate the architectural register to an identification of the first core in response to determining that the physical register is a location of a most recent write to the architectural register that has been executed by the first core, is executing on the first core, or is expected to execute on the first core, the most recent write according to program order. The control circuitry can be configured to maintain the intra-core register rename map and the inter-core register rename map.

    PROVIDING MEMORY DEPENDENCE PREDICTION IN BLOCK-ATOMIC DATAFLOW ARCHITECTURES

    公开(公告)号:US20180081686A1

    公开(公告)日:2018-03-22

    申请号:US15269254

    申请日:2016-09-19

    Abstract: Providing memory dependence prediction in block-atomic dataflow architectures is disclosed. In one aspect, a memory dependence prediction circuit is provided. The memory dependence prediction circuit comprises a predictor table configured to store multiple predictor table entries, each comprising a store instruction identifier, a block reach set, and a load set. Using this data, the memory dependence prediction circuit determines, upon a fetch of an instruction block by an execution pipeline, whether the instruction block contains store instructions that reach dependent load instructions. If so, the store instructions are marked as having dependent load instructions to wake. In some aspects, the memory dependence prediction circuit is configured to determine whether the instruction block contains dependent load instructions reached by store instructions. If so, the memory dependence prediction circuit delays execution of the dependent load instructions.

    Providing coherent merging of committed store queue entries in unordered store queues of block-based computer processors

    公开(公告)号:US09792211B2

    公开(公告)日:2017-10-17

    申请号:US14863577

    申请日:2015-09-24

    Abstract: Providing coherent merging of committed store queue entries in unordered store queues of block-based computer processors is disclosed. In one aspect, a block-based computer processor provides a merging logic circuit communicatively coupled to an unordered store queue and cache memory. The merging logic circuit is configured to select a first store queue entry in the unordered store queue, and read its memory address, an age indicator, and a data value. The age indicator and the data value are stored in merged data bytes within a merged data buffer. The merging logic circuit then locates a remaining store queue entry having a memory address identical to the first selected store queue entry, and reads its age indicator and data value. Based on the age indicator and one or more age indicators of the merged data bytes within the merged data buffer, the data value is merged into the merged data buffer.

    Providing predictive instruction dispatch throttling to prevent resource overflows in out-of-order processor (OOP)-based devices

    公开(公告)号:US10929139B2

    公开(公告)日:2021-02-23

    申请号:US16143883

    申请日:2018-09-27

    Abstract: Providing predictive instruction dispatch throttling to prevent resource overflow in out-of-order processor (OOP)-based devices is disclosed. An OOP-based device includes a system resource that may be consumed or otherwise occupied by instructions, as well as an execution pipeline comprising a decode stage and a dispatch stage. The OOP further maintains a running count and a resource usage threshold. Upon receiving an instruction block, the decode stage extracts a proxy value that indicates an approximate predicted count of instructions within the instruction block that will consume a system resource. The decode stage then increments the running count by the proxy value. The dispatch stage compares the running count to the resource usage threshold before dispatching any younger instruction blocks. If the running count exceeds the resource usage threshold, the dispatch stage blocks dispatching of younger instruction blocks until the running count no longer exceeds the resource usage threshold.

    Deadlock free resource management in block based computing architectures

    公开(公告)号:US10783011B2

    公开(公告)日:2020-09-22

    申请号:US15712121

    申请日:2017-09-21

    Abstract: Systems and methods are directed to efficient management of processor resources, particularly General Purpose Registers (GPRs), for example to minimize pipeline flushes prevent deadlocks by counting GPRs instead of allocating them to specific blocks of code. Blocks of code are allowed to execute if the Free GPRs count is adequate. The method contemplates counting the number of Register Writers in blocks of code which will write to GPRs which are in process of executing, and counting the GPRs which are available instead of merely allocating them to dedicated use by a block of code, or an instruction in a block of code. Because blocks do not run if there is not enough GPRs available for the block, deadlocks and pipeline flushes due to lack of resources can be minimized.

    PROVIDING VARIABLE INTERPRETATION OF USEFULNESS INDICATORS FOR MEMORY TABLES IN PROCESSOR-BASED SYSTEMS

    公开(公告)号:US20190079772A1

    公开(公告)日:2019-03-14

    申请号:US15701926

    申请日:2017-09-12

    Abstract: Providing variable interpretation of usefulness indicators for memory tables in processor-based systems is disclosed. In one aspect, a memory system comprises a memory table providing multiple memory table entries, each including a usefulness indicator. A memory controller of the memory system comprises a global polarity indicator representing how the usefulness indicator for each memory table entry is interpreted and updated by the memory controller. If the global polarity indicator is set, the memory controller interprets a value of each usefulness indicator as directly corresponding to the usefulness of the corresponding memory table entry. Conversely, if the global polarity indicator is not set, the polarity is reversed such that the memory controller interprets the usefulness indicator value as inversely corresponding to the usefulness of the corresponding memory table entry. In this manner, the interpretation and updating of usefulness indicators by the memory controller can be varied using the global polarity indicator.

    CACHING INSTRUCTION BLOCK HEADER DATA IN BLOCK ARCHITECTURE PROCESSOR-BASED SYSTEMS

    公开(公告)号:US20190065060A1

    公开(公告)日:2019-02-28

    申请号:US15688191

    申请日:2017-08-28

    Abstract: Caching instruction block header data in block architecture processor-based systems is disclosed. In one aspect, a computer processor device, based on a block architecture, provides an instruction block header cache dedicated to caching instruction block header data. Upon a subsequent fetch of an instruction block, cached instruction block header data may be retrieved from the instruction block header cache (if present) and used to optimize processing of the instruction block. In some aspects, the instruction block header data may include a microarchitectural block header (MBH) generated upon the first decoding of the instruction block by an MBH generation circuit. The MBH may contain static or dynamic information about the instructions within the instruction block. As non-limiting examples, the information may include data relating to register reads and writes, load and store operations, branch information, predicate information, special instructions, and/or serial execution preferences.

Patent Agency Ranking