Abstract:
The specification discloses a method and apparatus for determining the length of variable-length instructions that appear sequentially in an instruction stream without differentiation. The apparatus may be used to facilitate parallel processing of such variable-length instructions by a computer system. The apparatus includes: a circuit for providing a boundary marker for each instruction to indicate a boundary between that instruction and another instruction in the instruction stream, a circuit for processing instructions in sequence, a circuit for determining an actual boundary of a first instruction as it is processed, a circuit for comparing the boundary marker and the actual boundary of the first instruction to determine whether they match, a circuit for updating the boundary marker of the first instruction to the actual boundary of the first instruction when the boundary value and the actual boundary of the first instruction do not match, and a circuit for indicating a boundary between the first instruction and a next instruction from the stream of instructions based on the boundary marker of the first instruction.
Abstract:
A method, system, and apparatus may initialize a fixed plurality of page table entries for a fixed plurality of pages in memory, each page having a first size, wherein a linear address for each page table entry corresponds to a physical address and the fixed plurality of pages are aligned. A bit in each of the page table entries for the aligned pages may be set to indicate whether or not the fixed plurality of pages is to be treated as one combined page having a second page size larger than the first page size. Other embodiments are described and claimed.
Abstract:
Two latches store the state of a data signal at a transition of a clock signal. Comparison logic compares the outputs of the two latches and produces a signal to indicate whether the outputs are equal or unequal. Systems using the latches and comparison logic are described and claimed.
Abstract:
A method and apparatus for changing the configuration of a multi-core processor is disclosed. In one embodiment, a throttle module (or throttle logic) may determine the amount of parallelism present in the currently-executing program, and change the execution of the threads of that program on the various cores. If the amount of parallelism is high, then the processor may be configured to run a larger amount of threads on cores configured to consume less power. If the amount of parallelism is low, then the processor may be configured to run a smaller amount of threads on cores configured for greater scalar performance.
Abstract:
Methods and apparatus to reduce aging effect on memory are described. In one embodiment, a modified version of data is stored in a portion of a storage unit during a first time period.
Abstract:
A system for delivering power to a device in a specified voltage range is disclosed. The system includes a power delivery network, characterized by a response function, to deliver power to the device. A current computation unit stores values representing a sequence of current amplitudes drawn by the device on successive clock cycles, and provides them to a current to voltage computation unit. The current to voltage computation unit filters the current amplitudes according to coefficients derived from the response function to provide an estimate of the voltage seen by the device. Operation of the device is adjusted if the estimated voltage falls outside the specified range.
Abstract:
Disclosed are an apparatus, system, and method for implementing predicated instructions using micro-operations. A micro-code engine receives an instruction, decomposes the instruction, and generates a plurality of micro-operations to implement the instruction. Each of the decomposed micro-operations indicates a single destination register. For predicated instructions, the decomposed micro-operations include “conditional move” micro-operations to select between two potential output values. Except in the case that one of the potential output values is a constant, the decomposed micro-operations for a predicated instruction also include an append instruction that saves the incoming value of a destination register in a temporary variable. For at least one embodiment, the qualifying predicate for a predicated instruction is appended to the incoming value stored in the temporary register.
Abstract:
The present invention relates to the design of highly reliable high performance microprocessors, and more specifically to designs that use cache memory protection schemes such as, for example, a 1-hot plus valid bit scheme and a 2-hot vector cache scheme. These protection schemes protect the 1-hot vectors used in the tag array in the cache and are designed to provide hardware savings, operate at higher speeds and be simple to implement. In accordance with an embodiment of the present invention, a tag array memory including an input conversion circuit to receive a 1-hot vector and to convert the 1-hot vector to a 2-hot vector. The tag array memory also including a memory array coupled to the input conversion circuit, the memory array to store the 2-hot vector; and an output conversion circuit coupled to the memory array, the output conversion circuit to receive the 2-hot vector and to convert the 2-hot vector back to the 1-hot vector.
Abstract:
The present invention relates to the design of highly reliable high performance microprocessors, and more specifically to designs that use cache memory protection schemes such as, for example, a 1-hot plus valid bit scheme and a 2-hot vector cache scheme. These protection schemes protect the 1-hot vectors used in the tag array in the cache and are designed to provide hardware savings, operate at higher speeds and be simple to implement. In accordance with an embodiment of the present invention, a tag array memory circuit includes a plurality of memory bit circuits coupled together to form an n-bit memory cell, and a valid bit circuit coupled to the n-bit memory cell, the valid bit circuit being configured to be accessed simultaneously with the plurality of memory bit circuits.
Abstract:
An apparatus includes a clock to produce pulses and an electronic hardware structure having a plurality of rows and one or more ports. Each row is adapted to record a separate latency vector written through one of the ports. The latency vector recorded therein is responsive to the clock. A method of dispatching instructions in a processor includes updating a plurality of expected latencies to a portion of rows of a register latency table, and decreasing the expected latencies remaining in other of the rows in response to a clock pulse. The rows of the portion correspond to particular registers.