摘要:
A processor of an aspect includes a decode unit to decode a user-level suspend thread instruction that is to indicate a first alternate state. The processor also includes an execution unit coupled with the decode unit. The execution unit is to perform the instruction at a user privilege level. The execution unit in response to the instruction, is to: (a) suspend execution of a user-level thread, from which the instruction is to have been received; (b) transition a logical processor, on which the user-level thread was to have been running, to the indicated first alternate state; and (c) resume the execution of the user-level thread, when the logical processor is in the indicated first alternate state, with a latency that is to be less than half a latency that execution of a thread can be resumed when the logical processor is in a halt processor power state.
摘要:
Systems, methods, and apparatuses relating to instructions to reset software thread runtime property histories in a hardware processor are described. In one embodiment, a hardware processor includes a hardware guide scheduler comprising a plurality of software thread runtime property histories; a decoder to decode a single instruction into a decoded single instruction, the single instruction having a field that identifies a model-specific register; and an execution circuit to execute the decoded single instruction to check that an enable bit of the model-specific register is set, and when the enable bit is set, to reset the plurality of software thread runtime property histories of the hardware guide scheduler.
摘要:
Systems, methods, and apparatuses for autonomous functional testing of a processor are described. In one example, a processor includes a plurality of processor cores that are each coupled to a respective power management agent circuit; a cache shared by the plurality of processor cores; and a control register, that when set, causes: a save of a state of a first processor core of the plurality of processor cores to storage, a transfer of control of the first processor core to a power management agent circuit of the first processor core, isolation of the first processor core from the other of the plurality of processor cores by the power management agent circuit, performance of one or more functional tests from the cache on the first processor core caused by the power management agent circuit to generate a test result, removal of the isolation of the first processor core from the other of the plurality of processor cores by the power management agent circuit, and a transfer of the control by the power management agent circuit back to the first processor core.
摘要:
Disclosed embodiments relate to the usage of a branch prediction unit in service of performance sensitive microcode flows. In one example, a processor includes a branch prediction unit (BPU) and a pipeline including a fetch stage to fetch an instruction specifying an opcode, an operand, and a loop condition based on the operand, wherein the BPU is to generate a hint reflecting a predicted result of the loop condition, a decode stage to generate either a first or a second micro-operation flow as per the hint, the pipeline to begin executing the generated micro-operation flow; a read stage to read the operand and resolve the loop condition; and execution circuitry to continue the generated micro-operation flow if the prediction was correct, and, otherwise, to flush the pipeline, update the prediction, and switch from the generated micro-operation flow to the other of the first and second micro-operation flows.
摘要:
A processor includes a decode unit to decode a memory copy instruction that indicates a start of a source memory operand, a start of a destination memory operand, and an initial amount of data to be copied from the source memory operand to the destination memory operand. An execution unit, in response to the memory copy instruction, is to copy a first portion of data from the source memory operand to the destination memory operand before an interruption. A descending copy direction is to be used when the source and destination memory operands overlap. In response to the interruption, when the descending copy direction is used, the execution unit is to store a remaining amount of data to be copied, but is not to indicate a different start of the source memory operand, and is not to indicate a different start of the destination memory operand.
摘要:
Embodiments of an invention related to compacted context state management are disclosed. In one embodiment, a processor includes instruction hardware and state management logic. The instruction hardware is to receive a first save instruction and a second save instruction. The state management logic is to, in response to the first save instruction, save context state in an un-compacted format in a first save area. The state management logic is also to, in response to the second save instruction, save a compaction mask and context state in a compacted format in a second save area and set a compacted-save indicator in the second save area. The state management logic is also to, in response to a single restore instruction, determine, based on the compacted-save indicator, whether to restore context from the un-compacted format in the first save area or from the compacted format in the second save area.
摘要:
A method and apparatus for flushing and restoring core memory content to and from, respectively, external memory are described. In one embodiment, the apparatus is an integrated circuit comprising a plurality of processor cores, the plurality of process cores including one core having a first memory operable to store data of the one core, the one core to store data from the first memory to a second memory located externally to the processor in response to receipt of a first indication that the one core is to transition from a first low power idle state to a second low power idle state and receipt of a second indication generated externally from the one core indicating that the one core is to store the data from the first memory to the second memory, locations in the second memory at which the data is stored being accessible by the one core and inaccessible by other processor cores in the IC; and a power management controller coupled to the plurality of cores and located outside the plurality of cores.
摘要:
A processor includes a core with locally-gated circuitry, a decode unit, a local power gate (LPG) coupled to the locally-gated circuitry, and an execution unit. The decode unit includes logic to decode a store broadcast instruction of a specified width. The LPG includes logic to selectively provide power to the locally-gated circuitry, activate power to a first portion of the locally-gated circuitry for execution of full cache-line memory operations, and deactivate power to a second portion of the locally-gated circuitry the locally-gated circuitry. The execution unit includes logic to execute, by the first portion of the locally-gated circuitry for execution of full cache-line memory operations, the store broadcast instruction, the store broadcast instruction to store data of the specified width to storage of the processor.
摘要:
In one embodiment, the present invention includes an instruction decoder that can receive an incoming instruction and a path select signal and decode the incoming instruction into a first instruction code or a second instruction code responsive to the path select signal. The two different instruction codes, both representing the same incoming instruction may be used by an execution unit to perform an operation optimized for different data lengths. Other embodiments are described and claimed.
摘要:
In one embodiment, an apparatus includes: an instruction fetch circuit to fetch instructions; a decode circuit coupled to the instruction fetch circuit to decode the fetched instructions into micro-operations (pops); a scheduler coupled to the decode circuit to schedule the pops for execution; and an execution circuit coupled to the scheduler, the execution circuit comprising a plurality of execution ports to execute the pops. The scheduler may be configured to: schedule at least some pops of a first type for redundant execution on symmetric execution ports of the plurality of execution ports; and schedule pops of a second type for non-redundant execution on a single execution port of the plurality of execution ports. Other embodiments are described and claimed.