摘要:
An instruction decode unit is described including circuitry coupled to receive an instruction. The instruction identifies multiple operands, one of which is a destination operand. The circuitry responds to the instruction by producing: (i) operand codes specifying the operands, wherein the operand codes are produced in the order in which the operands are identified within the instruction, and (ii) a destination operand signal identifying the destination operand. In one embodiment, the decode unit responds to the instruction by producing the operand codes, operand address information, control signals, and the destination operand signal. A processor including the instruction decode unit is also described, as is a computer system including the processor. The instruction may include operand information which identifies the operands. The instruction may also include destination operand information which indicates which of the operands is the destination operand. The circuitry may produce the destination operand signal dependent upon the destination operand information. The instruction may be a member of an instruction set including instructions having a variable number of bytes. In one particular example, the instruction may be an x86 instruction including operand information which identifies two operands. The instruction may include a direction bit, and the value of the direction bit may indicate which of the two operands is the destination operand. In this case, the circuitry may produce the destination operand signal dependent upon the value of the direction bit.
摘要:
A technique for constraint management and validation for template-based device designs is disclosed. The technique includes generating a template-level representation of an electronic device design based on a transistor-level representation of the electronic device design. The template-level representation includes one or more hierarchies of templates. Each template represents a corresponding portion of the electronic device design. The technique further includes determining constraint declarations associated with the electronic device design and verifying whether there is a functional equivalence between the template-level representation to a register-transfer-level (RTL) representation of the electronic device design. The technique additionally includes verifying whether the constraint declarations are valid and verifying the electronic device design responsive to verifying the functional equivalence and verifying the constraint declarations.
摘要:
A microprocessor includes an expanded set of registers in addition to the architected set of registers specified by the microprocessor architecture employed by the microprocessor. The expanded set of registers are memory-mapped within the context of the program being executed. Upon a context switch, the microprocessor saves the state of the expanded registers to the corresponding memory locations. An application program may make use of the expanded registers by assigning the most-often used operands in the program to the set of memory locations corresponding to the expanded registers. The application programmer may than code instructions which access these operands with register identifiers corresponding to the expanded registers. In one embodiment, the microprocessor implements a portion of the expanded registers instead of the entire set of expanded registers. The implemented portion of the expanded registers are accessed as register accesses, while the unimplemented portion are converted to memory accesses. The decode unit within the microprocessor may be configured to convert instructions which are coded to access the unimplemented expanded registers into memory operations to access the corresponding memory location.
摘要:
A test configuration is provided which allows a plurality of variable delay units within a delay chain of a microprocessor clock generator to be compared with respect to one another. During normal operation, a set of multiplexers interposed within the delay chain are configured such that the plurality of variable delay units are electrically coupled in series with respect to one another. An external command signal may be provided to the microprocessor to initiate a test operation in which the variable delay units are tested for possible defects. During the test operation, a control unit selects the multiplexers such that the four delay units are electrically separated from one another. A common test signal is then driven through two or more of the variable delay units simultaneously, and a compare circuit coupled to the output of each variable delay unit determines whether a transition in the common pulse signal propagated through each variable delay unit at essentially the same time. If no manufacturing defects are present, the four outputs of the variable delay units should be virtually indistinguishable from one another. The results of the compare operation may be driven on external pins of the microprocessor or may be processed internally within the microprocessor. Similar tests may be conducted throughout the entire operating range of the variable delay units.
摘要:
A highspeed, intelligent, distributed control memory system is comprised of an array of modular, cascadable, integrated circuit devices, hereinafter referred to as "memory elements." Each memory element is further comprised of storage means, programmable on board processing ("distributed control") means and means for interfacing with both the host system and the other memory elements in the array utilizing a single shared bus. Each memory element of the array is capable of transferring (reading or writing) data between adjacent memory elements once per clock cycle. In addition, each memory element is capable of broadcasting data to all memory elements of the array once per clock cycle. This ability to asynchronously transfer data between the memory elements at the clock rate, using the distributed control, facilitates unburdening host system hardware and software from tasks more efficiently performed by the distributed control. As a result, the memory itself can, for example, perform such tasks as sorting and searching, even across memory element boundaries, in a manner which conserves, is faster and more efficient then using, host system resources.
摘要:
A memory controller includes a power mode sensitive reordering device coupled to receive a power mode indication. The memory controller includes a selectable high and low power mode. An indication of which of the high and low power modes is selected is coupled to the power mode sensitive reordering device as the power mode indication. In the low power mode, memory transactions are reordered to minimize power consumption in memory devices controlled by the memory controller.
摘要:
An apparatus comprises a buffer comprising a plurality of entries, a plurality of age vectors, and a control circuit coupled to the buffer. Each of the age vectors corresponds to one or more of the entries. Responsive to data being provided to the buffer to be written to at least a first entry, the control circuit is configured to generate a first age vector. The first age vector corresponds to the first entry, and is indicative of which of the plurality of entries contain data that is older than the data being written to the first entry. The control circuit is configured to select an entry for reading responsive to the plurality of age vectors. The selected entry has an attribute used to select the selected entry, and other entries indicated as storing older data in the age vector corresponding to the selected entry do not have the attribute.
摘要:
A system and method for transferring data using an early response signal to indicate subsequent transmission of data after a fixed latency, wherein the signal and data are transferred from a first clock domain to a second clock domain using a clock skipping technique. In one embodiment, an early response signal is transmitted by a first device k clock pulses prior to transmission of the data. The receiving device, which is operating at a higher clock rate, receives the early response signal and delays the signal by the number of skipped pulses which will occur in the second clock domain before the occurrence of the kth valid pulse. The second device employs a skip pattern generator to generate a signal indicative of this number of skipped pulses and provides the number to a delay circuit which delays the early response signal for an this number of clock pulses. The delayed early response signal is then output to the appropriate logic to indicate the latency of the subsequent data transfer.
摘要:
A microprocessor assigns a data transaction type to each instruction. The data transaction type is based upon the encoding of the instruction, and indicates an access mode for memory operations corresponding to the instruction. The access mode may, for example, specify caching and prefetching characteristics for the memory operation. The access mode for each data transaction type is selected to enhance the speed of access by the microprocessor to the data, or to enhance the overall cache and prefetching efficiency of the microprocessor by inhibiting caching and/or prefetching for those memory operations. Instead of relying on data memory access patterns and overall program behavior to determine caching and prefetching operations, these operations are determined on an instruction-by-instruction basis. Additionally, the data transaction types assigned to different instruction encodings may be revealed to program developers. Program developers may use the instruction encodings (and instruction encodings which are assigned to a nil data transaction type causing a default access mode) to optimize use of processor resources during program execution.
摘要:
A cache employs one or more prefetch ways for storing prefetch cache lines and one or more ways for storing accessed cache lines. Prefetch cache lines are stored into the prefetch way, while cache lines fetched in response to cache misses for requests initiated by a microprocessor connected to the cache are stored into the non-prefetch ways. Accessed cache lines are thereby maintained within the cache separately from prefetch cache lines. When a prefetch cache line is presented to the cache for storage, the prefetch cache line may displace another prefetch cache line but does not displace an accessed cache line. A cache hit in either the prefetch way or the non-prefetch ways causes the cache line to be delivered to the requesting microprocessor in a cache hit fashion. The cache is further configured to move prefetch cache lines from the prefetch way to the non-prefetch way if the prefetch cache lines are requested (i.e. they become accessed cache lines). Instruction cache lines may be moved immediately upon access, while data cache line accesses may be counted and a number of accesses greater than a predetermined threshold value may occur prior to moving the data cache line from the prefetch way to the non-prefetch way. Additionally, movement of an accessed cache line from the prefetch way to the non-prefetch way may be delayed until the accessed cache line is to be replaced by a prefetch cache line.