摘要:
A method and apparatus that generates a simplified, localized version (“a local stall”) of a global stall to improve the performance of a pipelined microprocessor. The local stall is generated when a data-dependency hazard is detected for a local consumer. Utilizing circuitry used in the pipelined microprocessor's data-forwarding circuitry, the local stall is generated with a relatively minor increase in circuitry. The local stall is generated much sooner than the global stall, arriving much sooner in a local pipeline. The local pipeline utilizes the local stall to override the global stall, when appropriate, and to ensure that correct data is read for a local consumer and to operate more efficiently than a standard pipeline without a local stall.
摘要:
A processing unit of the invention has multiple instruction pipelines for processing multi-threaded instructions. Each thread may have an urgency associated with its program instructions. The processing unit has a thread switch controller to monitor processing of instructions through the various pipelines. The thread controller also controls switch events to move from one thread to another within the pipelines. The controller may modify the urgency of any thread such as by issuing an additional instruction. The thread controller preferably utilizes certain heuristics in making switch event decisions. A time slice expiration unit may also monitor expiration of threads for a given time slice.
摘要:
In a computer system including a plurality of resources, a device receives a request from a software program to access a specified one of the plurality of resources, determines whether the specified one of the plurality of resources is a protected resource. If the specified one of the plurality of resources is a protected resource, the device denies the request if the computer system is operating in a protected mode of operation, and processes the request based on access rights associated with the software program if the computer system is not operating in the protected mode of operation.
摘要:
Virtual-machine-monitor operation and implementation is facilitated by number of easily implemented features and extensions added to the features of a processor architecture. These features, one or more of which are used in various embodiments of the present invention, include a vmsw instruction that provides a means for transitioning between virtualization mode and non-virtualization mode without an interruption, a virtualization fault that faults on an attempt by a priority-0 routine in virtualization mode attempting to execute a privileged instruction, and a flexible highest-implemented-address mechanism to partition virtual address space into a virtualization address space and a non-virtualization address space.
摘要:
A system and a method of providing error detection and correction of transmission of multiple flits between sending and receiving agents connected together in a network or computer interconnect environment is disclosed that comprises embedding a sequence identifier in each flit prior to transmission, sending each flit to a connected receiving agent, examining the sequence identifiers of each flit being received and requesting the sending agent to resend a flit if the sequence identifier for that flit is determined to be incorrect.In a preferred embodiment of the present invention, the sequence identifier is embedded in the control portion of the flit and comprises a sequence number that is incremented or otherwise changed in a predictable manner, so that the order of flits being received is predicted. If the sequence number for a flit is different that expected, the receiving agent requests that it be resent.
摘要:
The invention provides a processor with two or more parallel instruction paths for processing instructions. The instruction paths may be implemented with a plurality of cores on a common die. Instructions of the invention are preferably processed within a bundle of two or more instructions of a common program thread; and each of the instruction paths preferably forms a cluster to process bundled instructions. Each of the instruction paths has an array of pipelined execution units. Initially, two or more of the parallel instruction paths processes the same program thread (one or more bundles) through the execution units, but with different optimization characteristics set for each path. Assessment logic monitors the processing of the initial program thread through the execution units and selects the heuristics defining which path is in the lead. The other instruction paths are then reallocated, or synchronized, with the optimization characteristics of the lead instruction path, or with similarly optimized characteristics, to process other bundles of the program thread; preferably, the lead path continues processing of the initial thread without being disturbed. For other program threads, the process may repeat in processing like bundles through multiple instruction paths to identify the preferred heuristics; and then synchronizing the multiple instruction paths to the optimized characteristics to improve per thread performance.
摘要:
Multithreaded hardware systems and methods are disclosed. One embodiment of a system may comprise a multithreaded processor comprising a register file having N hardware threads, where N is an integer greater than or equal to one, and an offline storage structure having M hardware threads, where M is an integer greater than or equal to one. The multithreaded processor system may further comprise a thread control that transfers register values associated with at least one of the N hardware threads to registers of at least one of the M hardware threads and transfers register values of at least of one of the M hardware threads to registers of at least one of the N hardware threads.
摘要:
The invention provides a processor architecture that bypasses data hazards. The architecture has an array of pipelines and a register file. Each of the pipelines includes an array of execution units. The register file has a first section of n registers (e.g., 128 registers) and a second section of m registers (e.g., 16 registers). A write mux couples speculative data from the execution units to the second set of m registers and non-speculative data from a write-back stage of the execution units to the first section of n registers. A read mux couples the speculative data from the second set of m registers to the execution units to bypass data hazards within the execution units. The register file preferably includes column decode logic for each of the registers in the second section of m registers to architect speculative data without moving data. The decode logic first decodes, and then selects, an age of the producer of the speculative state; the newest producer enables the decode.
摘要:
Automatic manufacturing test case generation mechanism receives a plurality of design verification test cases (DVTCs) from one or more sources and based thereon automatically generates manufacturing test cases (MTCs). Each of the manufacturing test cases (MTCs) generated by the automatic manufacturing test includes at least a first design verification test case and a second design verification test case.
摘要:
A processor integrated circuit capable of executing more than one instruction stream has two or more processors. Each processor accesses instructions and data through a cache controller. There are multiple blocks of cache memory. Some blocks of cache memory may optionally be directly attached to particular cache controllers. The cache controllers access at least some of the multiple blocks of cache memory through high speed interconnect, these blocks being dynamically allocable to more than one cache controller. A resource allocation controller determines which cache memory controller has access to the dynamically allocable cache memory block. In an embodiment the cache controllers and cache memory blocks are associated with second level cache, each processor accesses the second level cache controllers upon missing in a first level cache of fixed size.