摘要:
Apparatuses and methods to perform gather instructions are presented. In one embodiment, an apparatus comprises a gather logic module which includes a gather logic unit to identify locality of data elements in response to a gather instruction. The apparatus includes memory comprising a plurality of memory rows including a memory row associated with the gather instruction. The apparatus further includes memory structure to store data element addresses accessed in response to the gather instruction.
摘要:
A method is described that includes reading a first read mask from a first register. The method also includes reading a first vector operand from a second register or memory location. The method also includes applying the read mask against the first vector operand to produce a set of elements for operation. The method also includes performing an operation of the set elements. The method also includes creating an output vector by producing multiple instances of the operation's result. The method also includes reading a first write mask from a third register, the first write mask being different than the first read mask. The method also includes applying the write mask against the output vector to create a resultant vector. The method also includes writing the resultant vector to a destination register.
摘要:
A system, techniques and apparatus are described for decoding an instruction in an a variable-length instruction set. An instruction encoding is described, in which legacy, present, and future instruction set extensions are supported, and increased functionality is provided, without expanding the code size and, in some cases, reducing the code size.
摘要:
A technique to enable efficient instruction fusion within a computer system. In one embodiment, a processor logic delays the processing of a second instruction for a threshold amount of time if a first instruction within an instruction queue is fusible with the second instruction.
摘要:
A trace management architecture to enable the reuse of uops within one or more repeated traces. More particularly, embodiments of the invention relate to a technique to prevent multiple accesses to various functional units within a trace management architecture by reusing traces or sequences of traces that are repeated during a period of operation of the microprocessor, avoiding performance gaps due to multiple trace cache accesses and increasing the rate at which uops can be executed within a processor.
摘要:
Methods and apparatuses for distributing architectural state information in a processor across multiple pipeline stages are described. An architectural value of a register is represented by a historical value added to an update value which is maintained in a non-final pipeline stage. When an instruction requires the architectural value, a calculation is made and that value is inserted into the pipeline for processing. Recovery of both pre- and post-execution architectural state information is made possible by storing both the update value and the operation to take place on that value for each decoded instruction.
摘要:
A method for merging binary translated basic blocks of instructions. The method is for use in a computer system having in a memory a first set of instructions including blocks of instructions, and a translator for translating instructions executable on a source instruction set architecture into instructions executable on a target instruction set architecture. The method includes a first step of determining, by the translator, an order of execution from a first block of instructions to a second block of instructions. A second step of the method includes generating, by the translator, a hyperblock of instructions representing the first and second block of instructions translated and placed adjacent in a memory location in the order of execution.
摘要:
The power consumed within an integrated circuit (IC) is reduced without substantial impact on its performance for typical applications by throttling the performance of particular functional units within the IC. Artificial worst-case power consumption is reduced by throttling down the activity levels of long-duration sequences of high-power operations. The recent utilization levels of particular functional units within an IC are monitored--for example, by computing each functional unit's average duty cycle over its recent operating history. If this activity level is greater than a threshold, then the functional unit is operated in a reduced-power mode. The threshold value is set large enough to allow short bursts of high utilization to occur without impacting performance. The invention allows an integrated circuit to dynamically make the tradeoff between high-speed operation and low-power operation, by throttling back performance of localized functional units when their utilization exceeds a sustainable level. Additionally, this dynamic power/speed tradeoff can be optimized across multiple functional units within an IC or among multiple ICs within a system. Additionally, this dynamic power/speed tradeoff can be altered by providing software control over throttling parameters.
摘要:
In one embodiment, a processor includes a current protection controller to: receive instruction width information and instruction type information associated with one or more instructions stored in an instruction queue prior to execution of the one or more instructions by an execution circuit; determine a power license level for the core based on the corresponding instruction width information and the instruction type information; generate a request for a license for the core corresponding to the power license level; and communicate the request to a power controller when the one or more instructions are non-speculative, and defer communication of the request when at least one of the one or more instructions is speculative. Other embodiments are described and claimed.
摘要:
A method of an aspect includes receiving a packed data operation mask shift instruction. The packed data operation mask shift instruction indicates a source having a packed data operation mask, indicates a shift count number of bits, and indicates a destination. The method further includes storing a result in the destination in response to the packed data operation mask shift instruction. The result includes a sequence of bits of the packed data operation mask that have been shifted by the shift count number of bits. Other methods, apparatus, systems, and instructions are disclosed.