摘要:
Methods and apparatus to operate various logic blocks of an integrated circuit (IC) at independent voltages are described. In one embodiment, supply of power to one or more domains in an IC is adjusted based on an indication that power consumption by components of the corresponding domain is to be modified. Other embodiments are also described.
摘要:
Methods and apparatus for partitioning a microprocessor pipeline to support pipelined branch prediction and instruction fetching of multiple execution threads. A thread selection stage selects a thread from a plurality of execution threads. In one embodiment, storage in a branch prediction output queue is pre-allocated to a portion of the thread in one branch prediction stage in order to prevent stalling of subsequent stages in the branch prediction pipeline. In another embodiment, an instruction fetch stage fetches instructions at a fetch address corresponding to a portion of the selected thread. Another instruction fetch stage stores the instruction data in an instruction fetch output queue if enough storage is available. Otherwise, instruction fetch stages corresponding to the selected thread are invalidated and refetched to avoid stalling preceding stages in the instruction fetch pipeline, which may be fetching instructions of another thread.
摘要:
Methods and apparatus for partitioning a microprocessor pipeline to support pipelined branch prediction and instruction fetching of multiple execution threads. A thread selection stage selects a thread from a plurality of execution threads. In one embodiment, storage in a branch prediction output queue is pre-allocated to a portion of the thread in one branch prediction stage in order to prevent stalling of subsequent stages in the branch prediction pipeline. In another embodiment, an instruction fetch stage fetches instructions at a fetch address corresponding to a portion of the selected thread. Another instruction fetch stage stores the instruction data in an instruction fetch output queue if enough storage is available. Otherwise, instruction fetch stages corresponding to the selected thread are invalidated and refetched to avoid stalling preceding stages in the instruction fetch pipeline, which may be fetching instructions of another thread.
摘要:
According to one embodiment, a method is disclosed. The method includes receiving a value at a vector length (VL) tracker and establishing a VL for subsequent micro-operations (μops) that are to be executed corresponding to the value.
摘要:
Apparatuses and methods for dead instruction identification are disclosed. In one embodiment, an apparatus includes an instruction buffer and a dead instruction identifier. The instruction buffer is to store an instruction stream having a single entry point and a single exit point. The dead instruction identifier is to identify dead instructions based on a forward pass through the instruction stream.
摘要:
A method is disclosed. The method includes scheduling a load operation at least twice the size of a maximum access supported by a memory device, dividing the load operation into a plurality of separate load operation segments having a size equivalent to the maximum access supported by the memory device, and performing each of the plurality of load operation segments. A further method is disclosed where a temporary register is used to minimize the number of memory accesses to support unaligned accesses.
摘要:
In one embodiment, the present invention includes a method for executing an operation on low order portions of first and second source operands using a first execution stack of a processor and executing the operation on high order portions of the first and second source operands using a second execution stack of the processor, where the operation in the second execution stack is staggered by one or more cycles from the operation in the first execution stack. Other embodiments are described and claimed.
摘要:
Embodiments of the present invention provide a method and system for staging the data output from an addressable memory location as a plurality of fields. In embodiments, each field of a data item that is stored at an address may be output during a different clock cycle. In further embodiments, the most time critical field may be output first.
摘要:
A system and method of managing processor instructions provides enhanced performance. The system and method provide for decoding a first instruction into a plurality of operations with a decoder. A first copy of the operations is passed from the decoder to a build engine associated with a trace cache. The system and method further provide for passing a second copy of the operation from the decoder directly to a back end allocation module such that the operations bypass the build engine and the allocation module is in a decoder reading state.
摘要:
Rather than steering one macroinstruction at a time to decode logic in a processor, multiple macroinstructions may be steered at any given time. In one embodiment, a pointer calculation unit generates a pointer that assists in determining a stream of one or more macroinstructions that may be steered to decode logic in the processor.