摘要:
A method and logical apparatus for switching between single-threaded and multi-threaded execution states within a simultaneous multi-threaded (SMT) processor provides a mechanism for switching between single-threaded and multi-threaded execution. The processor receives an instruction specifying a transition from a single-threaded to a multi-threaded mode or vice-versa and halts execution of all threads executing on the processor. Internal control logic controls a sequence of events that ends instruction prefetching, dispatch of new instructions, interrupt processing and maintenance operations and waits for operation of the processor to complete for instructions that are in process. Then, the logic determines one or more threads to start in conformity with a thread enable state specifying the enable state of multiple threads and reallocates various resources, dividing them between threads if multiple threads are specified for further execution (multi-threaded mode) or allocating substantially all of the resources to a single thread if further execution is specified as single-threaded mode. The processor then starts execution of the remaining enabled threads.
摘要:
A method, system and computer program product for processing in a multiprocessor data processing system are disclosed. The method includes, in response to executing a load-and-reserve instruction in a processor core, the processing core sending a load-and-reserve operation for an address to a lower level cache of a memory hierarchy, invalidating data for the address in a store-through upper level cache, and placing data returned from the lower level cache into the store-through upper level cache.
摘要:
A method, apparatus, and computer program product are disclosed in a data processing system for ensuring processing fairness in simultaneous multi-threading (SMT) microprocessors that concurrently execute multiple threads during each clock cycle. A clock cycle priority is assigned to a first thread and to a second thread during a standard selection state that lasts for an expected number of clock cycles. The clock cycle priority is assigned according to a standard selection definition during the standard selection state by selecting the first thread to be a primary thread and the second thread to be a secondary thread during the standard selection state. If a condition exists that requires overriding the standard selection definition, an override state is executed during which the standard selection definition is overridden by selecting the second thread to be the primary thread and the first thread to be the secondary thread. The override state is forced to be executed for an override period of time which equals the expected number of clock cycles plus a forced number of clock cycles. The forced number of clock cycles is granted to the first thread in response to the first thread again becoming the primary thread.
摘要:
Arrangements and method for enabling and disabling cache bypass in a computer system with a cache hierarchy. Cache bypass status is identified with respect to at least one cache line. A cache line identified as cache bypass enabled is transferred to one or more higher level caches of the cache hierarchy, whereby a next higher level cache in the cache hierarchy is bypassed, while a cache line identified as cache bypass disabled is transferred to one or more higher level caches of the cache hierarchy, whereby a next higher level cache in the cache hierarchy is not bypassed. Included is an arrangement for selectively enabling or disabling cache bypass with respect to at least one cache line based on historical cache access information.
摘要:
A microprocessor includes a functional block having dynamic power savings circuitry, a functional block control circuit, and a thermal control unit. The functional block control circuits are capable of altering performance characteristics of their associated functional blocks automatically upon detecting an over temperature condition. The thermal control unit receives an over-temperature signal indicating a processor temperature exceeding a threshold and invokes the one or more of the functional block control units in response to the signal. The functional block control units respond to signals from the thermal control unit by reducing processor activity, slowing processor performance, or both. The reduced activity that results causes the dynamic power saving circuitry to engage. The functional block control units can throttle performance by numerous means including reducing the exploitable parallelism within the processor, suspending out-of-order execution, reducing effective resource size, and the like.
摘要:
A method of generating a Global History Vector includes the steps of determining if a selected group of instructions contains a branch instruction. A current Global History Vector is maintained in a shift register when the selected group does not contain a branch instruction. A first value is shifted into the shift register to generate a second vector if the selected group contains a branch instruction which is predicted to be a branch taken. A second value is shifted into the shift register to generate a second vector when the selected group contains a branch instruction and the selected group does not include a branch instruction predicted to be a branch taken.
摘要:
A branch prediction method includes the step of retrieving prediction values from a local branch history table and a global branch history table. A branch prediction operation is selectively performed using the value retrieved from the local branch history table when the value from the local branch history table falls within first predicted limits. A branch prediction operation is selectively performed using the value retrieved from the global branch history table when the value from the global branch history falls within a second predetermined limit.
摘要:
In a first aspect of the present invention, a method for prefetching instructions in a superscalar processor is disclosed. The method comprises the steps of fetching a set of instructions along a predicted path and prefetching a predetermined number of instructions if a low confidence branch is fetched and storing the predetermined number of instructions in a prefetch buffer. In a second aspect of the present invention, a system for prefetching instructions in a superscalar processor is disclosed. The system comprises a cache for fetching a set of instructions along a predicted path, a prefetching mechanism coupled to the cache for prefetching a predetermined number of instructions if a low confidence branch is fetched and a prefetch buffer coupled to the prefetching mechanism for storing the predetermined number of instructions. Through the use of the method and system in accordance with the present invention, existing prefetching algorithms are improved with minimal additional hardware cost.
摘要:
Mechanisms are provided for partial flush handling with multiple branches per instruction group. The instruction fetch unit sorts instructions into groups. A group may include a floating branch instruction and a boundary branch instruction. For each group of instructions, the instruction sequencing unit creates an entry in a global completion table (GCT), which may also be referred to herein as a group completion table. The instruction sequencing unit uses the GCT to manage completion of instructions within each outstanding group. Because each group may include up to two branches, the instruction sequencing unit may dispatch instructions beyond the first branch, i.e. the floating branch. Therefore, if the floating branch results in a misprediction, the processor performs a partial flush of that group, as well as a flush of every group younger than that group.
摘要:
A method and system for scheduling threads on simultaneous multithreaded processors are disclosed. Hardware and operating system communicate with one another providing information relating to thread attributes for threads executing on processing elements. The operating system determines thread scheduling based on the information.