-
公开(公告)号:US09946549B2
公开(公告)日:2018-04-17
申请号:US14639085
申请日:2015-03-04
Applicant: QUALCOMM Incorporated
Inventor: Gregory Michael Wright
CPC classification number: G06F9/384 , G06F9/30145 , G06F9/3836
Abstract: An apparatus for mapping an architectural register to a physical register can include a memory and control circuitry. The memory can be configured to store an intra-core register rename map and an inter-core register rename map. The intra-core register rename map can be configured to map the architectural register to the physical register of a core of a multi-core processor. The inter-core register rename map can be configured to relate the architectural register to an identification of the first core in response to determining that the physical register is a location of a most recent write to the architectural register that has been executed by the first core, is executing on the first core, or is expected to execute on the first core, the most recent write according to program order. The control circuitry can be configured to maintain the intra-core register rename map and the inter-core register rename map.
-
公开(公告)号:US20180081686A1
公开(公告)日:2018-03-22
申请号:US15269254
申请日:2016-09-19
Applicant: QUALCOMM Incorporated
Inventor: Chen-Han Ho , Gregory Michael Wright
Abstract: Providing memory dependence prediction in block-atomic dataflow architectures is disclosed. In one aspect, a memory dependence prediction circuit is provided. The memory dependence prediction circuit comprises a predictor table configured to store multiple predictor table entries, each comprising a store instruction identifier, a block reach set, and a load set. Using this data, the memory dependence prediction circuit determines, upon a fetch of an instruction block by an execution pipeline, whether the instruction block contains store instructions that reach dependent load instructions. If so, the store instructions are marked as having dependent load instructions to wake. In some aspects, the memory dependence prediction circuit is configured to determine whether the instruction block contains dependent load instructions reached by store instructions. If so, the memory dependence prediction circuit delays execution of the dependent load instructions.
-
公开(公告)号:US09824012B2
公开(公告)日:2017-11-21
申请号:US14863577
申请日:2015-09-24
Applicant: QUALCOMM Incorporated
Inventor: Gregory Michael Wright
IPC: G06F12/00 , G06F13/00 , G06F13/28 , G06F12/0831 , G06F12/128 , G06F12/123 , G06F12/0855 , G06F9/38
CPC classification number: G06F12/0833 , G06F9/3834 , G06F9/3855 , G06F12/0855 , G06F12/12 , G06F12/123 , G06F12/128 , G06F2212/621 , G06F2212/69 , G06F2212/70
Abstract: Providing coherent merging of committed store queue entries in unordered store queues of block-based computer processors is disclosed. In one aspect, a block-based computer processor provides a merging logic circuit communicatively coupled to an unordered store queue and cache memory. The merging logic circuit is configured to select a first store queue entry in the unordered store queue, and read its memory address, an age indicator, and a data value. The age indicator and the data value are stored in merged data bytes within a merged data buffer. The merging logic circuit then locates a remaining store queue entry having a memory address identical to the first selected store queue entry, and reads its age indicator and data value. Based on the age indicator and one or more age indicators of the merged data bytes within the merged data buffer, the data value is merged into the merged data buffer.
-
公开(公告)号:US09792211B2
公开(公告)日:2017-10-17
申请号:US14863577
申请日:2015-09-24
Applicant: QUALCOMM Incorporated
Inventor: Gregory Michael Wright
IPC: G06F12/00 , G06F13/00 , G06F13/28 , G06F12/0831 , G06F12/128 , G06F12/123 , G06F12/0855 , G06F9/38
Abstract: Providing coherent merging of committed store queue entries in unordered store queues of block-based computer processors is disclosed. In one aspect, a block-based computer processor provides a merging logic circuit communicatively coupled to an unordered store queue and cache memory. The merging logic circuit is configured to select a first store queue entry in the unordered store queue, and read its memory address, an age indicator, and a data value. The age indicator and the data value are stored in merged data bytes within a merged data buffer. The merging logic circuit then locates a remaining store queue entry having a memory address identical to the first selected store queue entry, and reads its age indicator and data value. Based on the age indicator and one or more age indicators of the merged data bytes within the merged data buffer, the data value is merged into the merged data buffer.
-
公开(公告)号:US20170083313A1
公开(公告)日:2017-03-23
申请号:US14861201
申请日:2015-09-22
Applicant: QUALCOMM Incorporated
Inventor: Karthikeyan Sankaralingam , Gregory Michael Wright
CPC classification number: G06F15/7867 , G06F9/30181 , G06F9/3836 , G06F9/3897 , G06F9/4494 , G06F15/7892 , G06F15/825
Abstract: Configuring coarse-grained reconfigurable arrays (CGRAs) for dataflow instruction block execution in block-based dataflow instruction set architectures (ISAs) is disclosed. In one aspect, a CGRA configuration circuit is provided, comprising a CGRA having an array of tiles, each of which provides a functional unit and a switch. An instruction decoding circuit of the CGRA configuration circuit maps a dataflow instruction within a dataflow instruction block to one of the tiles of the CGRA. The instruction decoding circuit decodes the dataflow instruction, and generates a function control configuration for the functional unit of the mapped tile to provide the functionality of the dataflow instruction. The instruction decoding circuit further generates switch control configurations for switches along a path of tiles within the CGRA so that an output of the functional unit of the mapped tile is routed to each tile corresponding to consumer instructions of the dataflow instruction.
-
16.
公开(公告)号:US11269640B2
公开(公告)日:2022-03-08
申请号:US15431763
申请日:2017-02-13
Applicant: QUALCOMM Incorporated
Inventor: Gregory Michael Wright
IPC: G06F9/38 , G06F12/0813 , G06F12/0864 , G06F12/1009 , G06F12/0842 , G06F12/02 , G06F12/06 , G06F9/30
Abstract: The disclosure relates to processing in-flight blocks in a processor pipeline according to an expected execution mode to reduce synchronization delays that could otherwise arise due to transitions among processor modes with varying privilege levels (e.g., user mode, supervisor mode, hypervisor mode, etc.). More particularly, a program counter associated with an instruction block to be fetched may be translated to one or more execute permissions associated with the instruction block and the instruction block may be associated with a speculative execution mode based at least in part on the one or more execute permissions. Accordingly, the instruction block may be processed relative to the speculative execution mode while in-flight within the processor pipeline.
-
公开(公告)号:US10929139B2
公开(公告)日:2021-02-23
申请号:US16143883
申请日:2018-09-27
Applicant: QUALCOMM Incorporated
IPC: G06F9/38
Abstract: Providing predictive instruction dispatch throttling to prevent resource overflow in out-of-order processor (OOP)-based devices is disclosed. An OOP-based device includes a system resource that may be consumed or otherwise occupied by instructions, as well as an execution pipeline comprising a decode stage and a dispatch stage. The OOP further maintains a running count and a resource usage threshold. Upon receiving an instruction block, the decode stage extracts a proxy value that indicates an approximate predicted count of instructions within the instruction block that will consume a system resource. The decode stage then increments the running count by the proxy value. The dispatch stage compares the running count to the resource usage threshold before dispatching any younger instruction blocks. If the running count exceeds the resource usage threshold, the dispatch stage blocks dispatching of younger instruction blocks until the running count no longer exceeds the resource usage threshold.
-
公开(公告)号:US10783011B2
公开(公告)日:2020-09-22
申请号:US15712121
申请日:2017-09-21
Applicant: QUALCOMM Incorporated
Inventor: Vignyan Reddy Kothinti Naresh , Gregory Michael Wright
IPC: G06F9/52
Abstract: Systems and methods are directed to efficient management of processor resources, particularly General Purpose Registers (GPRs), for example to minimize pipeline flushes prevent deadlocks by counting GPRs instead of allocating them to specific blocks of code. Blocks of code are allowed to execute if the Free GPRs count is adequate. The method contemplates counting the number of Register Writers in blocks of code which will write to GPRs which are in process of executing, and counting the GPRs which are available instead of merely allocating them to dedicated use by a block of code, or an instruction in a block of code. Because blocks do not run if there is not enough GPRs available for the block, deadlocks and pipeline flushes due to lack of resources can be minimized.
-
19.
公开(公告)号:US20190079772A1
公开(公告)日:2019-03-14
申请号:US15701926
申请日:2017-09-12
Applicant: QUALCOMM Incorporated
Inventor: Anil Krishna , Yongseok Yi , Eric Rotenberg , Vignyan Reddy Kothinti Naresh , Gregory Michael Wright
Abstract: Providing variable interpretation of usefulness indicators for memory tables in processor-based systems is disclosed. In one aspect, a memory system comprises a memory table providing multiple memory table entries, each including a usefulness indicator. A memory controller of the memory system comprises a global polarity indicator representing how the usefulness indicator for each memory table entry is interpreted and updated by the memory controller. If the global polarity indicator is set, the memory controller interprets a value of each usefulness indicator as directly corresponding to the usefulness of the corresponding memory table entry. Conversely, if the global polarity indicator is not set, the polarity is reversed such that the memory controller interprets the usefulness indicator value as inversely corresponding to the usefulness of the corresponding memory table entry. In this manner, the interpretation and updating of usefulness indicators by the memory controller can be varied using the global polarity indicator.
-
公开(公告)号:US20190065060A1
公开(公告)日:2019-02-28
申请号:US15688191
申请日:2017-08-28
Applicant: QUALCOMM Incorporated
Inventor: Anil Krishna , Gregory Michael Wright , Yongseok Yi , Matthew Gilbert , Vignyan Reddy Kothinti Naresh
IPC: G06F3/06 , G06F12/02 , G06F12/0802
Abstract: Caching instruction block header data in block architecture processor-based systems is disclosed. In one aspect, a computer processor device, based on a block architecture, provides an instruction block header cache dedicated to caching instruction block header data. Upon a subsequent fetch of an instruction block, cached instruction block header data may be retrieved from the instruction block header cache (if present) and used to optimize processing of the instruction block. In some aspects, the instruction block header data may include a microarchitectural block header (MBH) generated upon the first decoding of the instruction block by an MBH generation circuit. The MBH may contain static or dynamic information about the instructions within the instruction block. As non-limiting examples, the information may include data relating to register reads and writes, load and store operations, branch information, predicate information, special instructions, and/or serial execution preferences.
-
-
-
-
-
-
-
-
-