摘要:
A reduced-power memory (such as for a cache memory system of a processor or a microprocessor) provides per-sector ground control to advantageously reduce power consumption. Selective power control of a plurality of sectors comprised in the reduced-power memory is responsive to a subset of address bits for accessing the memory. The selective power control individually powers-up a selected one of the sectors in response to an access, and then powers-down the selected sector when the access is complete. The power-up is via a decrease in ground potential from a retention level to an access level. Time needed to vary the ground potential is optionally masked by providing address information used by the selective power control in advance of providing other address information. For example, in a cache, a tag access is overlapped with power-up of a selected sector, thus masking latency of powering up the selected sector.
摘要:
An instruction stream having variable length instructions with embedded constants (e.g. immediate values and displacements) is translated into a stream of operations and a corresponding stream of bit fields, enabling advantageous compact storage of the embedded constants. The operations and the compact constants are optionally stored in entries in a trace cache and/or processed by execution pipelines. The compact constants are optionally formulated as a small constant field, a pointer, or both. The pointer of a particular one of the operations optionally references one of the bit fields within a window of the operations associated with the particular operation. A full-sized constant is constructed from one or more contiguous ones of the bit fields, starting with the referenced bit field, by unpacking and uncompressing information from the contiguous bit fields. An operation optionally includes a plurality of small constant fields and pointers to specify a respective plurality of constants.
摘要:
An Adaptive Computing Ensemble (ACE) includes a plurality of flexible computation units as well as an execution controller to allocate the units to Computing Ensembles (CEs) and to assign threads to the CEs. The units may be any combination of ACE-enabled units, including instruction fetch and decode units, integer execution and pipeline control units, floating-point execution units, segmentation units, special-purpose units, reconfigurable units, and memory units. Some of the units may be replicated, e.g. there may be a plurality of integer execution and pipeline control units. Some of the units may be present in a plurality of implementations, varying by performance, power usage, or both. The execution controller dynamically alters the allocation of units to threads in response to changing performance and power consumption observed behaviors and requirements. The execution controller also dynamically alters performance and power characteristics of the ACE-enabled units, according to the observed behaviors and requirements.
摘要:
Various aspects provide for implementing a cache coherence protocol. A system comprises at least one processing component and a centralized controller. The at least one processing component comprises a cache controller. The cache controller is configured to manage a cache memory associated with a processor. The centralized controller is configured to communicate with the cache controller based on a power state of the processor.
摘要:
This invention includes a circuit for tracking memory operations with trace-based execution. Each trace includes a sequence of operations that includes zero or more of the memory operations. The memory operations being executed form a set of active memory operations that have a predefined program order among them and corresponding ordering constraints. At least some of the active memory operations access the memory in an execution order that is different from the program order. Checkpoint entries are associated with each trace. Each entry refers to a checkpoint location. Memory operation ordering entries correspond to each one of the active memory operations. Violations of the ordering constraints result in overwriting the checkpoint locations associated with the selected trace as well as the checkpoint locations associated with traces that are younger than the selected trace.
摘要:
A virtual core management system including a physical core and a first virtual core including a collection of logical states associated with execution of a first program. The first virtual core is mapped to the physical core. The virtual core management system further includes a second virtual core including a collection of logical states associated with execution of a second program, and a virtual core management component configured to unmap the first virtual core from the physical core and map the second virtual core to the physical core in response to the virtual core management component detecting that the physical core is idle.
摘要:
An instruction processing circuit includes a decoder circuit, a basic block builder circuit, a multi-block builder circuit, first and second predictor circuits, and a sequencer circuit, where the sequencer circuit is operable, in a first environment, to cause the first predictor circuit to generate a prediction for a particular conditional branch op concurrently with the second predictor circuit generating a prediction for another particular conditional branch op, where the sequencer circuit is also operable, in a second environment, to cause the first predictor circuit to generate a prediction for the particular conditional branch op sequentially with the second predictor circuit generating a prediction for the another particular conditional branch operation.
摘要:
An instruction processing circuit for a processor includes a decoder circuit, a cache circuit, a sequencer circuit operable to select a next sequence of operations, and an operations fetch circuit operable to convey the next sequence of operations to an execution circuit, receive an indication that a sequencing action of the sequencer circuit is sequencing ahead of the execution circuit, and switch, based on the indication, a source of the operations fetch circuit between the cache circuit and the decoder circuit.
摘要:
An instruction processing unit includes a trace builder circuit operable to (i) receive at least a portion of a first type of sequence of operations and to generate, based thereon, a second type of sequence of operations, where the portion includes at most one control transfer instruction that, when present, ends the portion, (ii) receive sets of at least two sequences of operations and to generate, based thereon, a plurality of third type of sequences of operations, where a sequence of operations of the third type includes one or more interior control transfer instructions and is generated from the sequence of operations of the second type and another sequence of operations of the third type, and (iii) retrieve the sequence of operations of the second type and the another sequence of operations of the third type from a cache circuit.
摘要:
A method and apparatus for optimizing a sequence of operations adapted for execution by a processor is disclosed to include locating an operation, if any, that is next within the sequence of operations and setting a current operation to be that operation. The current operation is processed as follows: a) de-activating, if not already de-activated, a consumed indicator associated with the current operation; and b) when the current operation is of the producer type, then activating, if not already activated, a producer indicator associated with the current operation, and locating a first set of operations, if any, that i) are earlier in the sequence of operations than the current operation, ii) have their associated producer indicator activated, and iii) have their associated consumed indicator de-activated, and then de-activating the producer indicator associated with each operation in the first set. When the current operation is of the consumer type, then locating a second set of operations, if any, that are earlier in the sequence of operations than the current operation, and then activating, if not already activated, the consumed indicator associated with each operation in the second set.