摘要:
Disclosed are a multi-die processor apparatus and system. Processor logic to execute one or more instructions is allocated among two or more face-to-faces stacked dice. The processor includes a conductive interface between the stacked dice to facilitate die-to-die communication.
摘要:
Disclosed are a multi-die processor apparatus and system. Processor logic to execute one or more instructions is allocated among two or more face-to-faces stacked dice. The processor includes a conductive interface between the stacked dice to facilitate die-to-die communication.
摘要:
For use in a processor having a result bus of insufficient width to convey all results of a given multiple-result instruction concurrently, a system for, and method of, writing back the results of the multiple-result instruction. In one embodiment, the system includes: (1) multi-result node creation circuitry that creates a multi-result node having at least first and second results for the multiple-result instruction and (2) node transmission circuitry, coupled to the multi-result node creation circuitry, that transmits the first and second results of said multi-result node sequentially over the result bus.
摘要:
A CAM system (2) stores a plurality of data sets in a plurality of pairs of CAM cells (4) and RAM cells (6). The portion of a particular data set stored in one of the RAM cells is accessed by inputting a tag to CAM cells that matches the portion of the data set stored in the CAM cell associated with the particular RAM cell. CAM system incorporates a novel two-stage matchline re-coding scheme to improve performance. Each of a plurality of first stage circuits (10) receives a plurality of matchline signals from a plurality of CAM sets and a plurality of data inputs from the corresponding RAM sets. Each output of the first stage circuits is further processed by a second stage circuit (12) which generates the final data output. The CAM system avoids the use of self-timed control signals and sense amplifiers.
摘要:
A method, apparatus, and system are provided for a multi-threaded virtual state mechanism. According to one embodiment, active thread state of a first active thread is received using a virtual state mechanism, and virtual thread state is generated in accordance with the active thread state of the first active thread, and the virtual thread state corresponding to the first active thread is forwarded to state update logic.
摘要:
Fusing micro-operations (uops) together. Intra-instruction fusing can increase cache memory storage efficiency and computer instruction processing bandwidth within a microprocessor without incurring significant computer system cost. Uops are fused, stored in cache memory, un-fused, executed in parallel, and retired in order to optimize cost and performance.
摘要:
The present invention is directed to a system and method for implementing a re-ordered instruction cache. In one embodiment, groups or “packets” of instructions with specific packet sizes are formed. Each of packets includes two or more positions. The two or more positions are defined such that they support one or more different types of instructions. Each of the positions are also correlated to a subset of the specialized execution units of the processor. Given a specific packet size and definitions for each of the positions, each of the instructions are re-ordered according to instruction type and loaded into the instruction cache in the new order.
摘要:
A data processor (40) keeps track of misses to a cache (71) so that multiple misses within the same cache line can be merged or folded at reload time. A load/store unit (60) includes a completed store queue (61) for presenting store requests to the cache (71) in order. If a store request misses in the cache (71), the completed store queue (61) requests the cache line from a lower-level memory system (90) and thereafter inactivates the store request. When a reload cache line is received, the completed store queue (61) compares the reload address to all entries. If at least one address matches the reload address, one entry's data is merged with the cache line prior to storage in the cache (71). Other matching entries become active and are allowed to reaccess the cache (71). A miss queue (80) coupled between the load/store unit (60) and the lower-level memory system (90) implements reload folding to improve efficiency.
摘要:
Method, apparatus and system embodiments provide support for multiple SoEMT software threads on multiple SMT logical thread contexts. A thread translation table maintains physical-to-virtual thread translation information in order to provide such information to structures within a processor that utilize virtual thread information. By associating a thread translation table with such structures, a processor that supports simultaneous multithreading (SMT) may be easily retrofitted to support switch-on-event multithreading on the SMT logical processors.
摘要:
A stack pointer update technique in which the stack pointer is updated without executing micro-operations to add or subtract a stack pointer value. The stack pointer update technique is also described to reset the stack pointer to a predetermined value without executing micro-operations to add or subtract stack a stack pointer value.