摘要:
A multithreading processor for concurrently executing multiple threads is provided. The processor includes an execution pipeline and a thread scheduler that dispatches instructions of the threads to the execution pipeline. The execution pipeline execution pipeline is configured for generating a thread context (TC) flush indicator associated with a thread context when one or more instructions of the thread context would stall in the execution pipeline. One or more instructions in the pipeline of the thread context associated with the thread context flush signal can be flushed or nullified.
摘要:
An apparatus for scheduling dispatch of instructions among a plurality of threads being concurrently executed in a multithreading processor is provided. The apparatus includes an instruction decoder that generate register usage information for an instruction from each of the threads, a priority generator that generates a priority for each instruction based on the register usage information and state information of instructions currently executing in an execution pipeline, and selection logic that dispatches at least one instruction from at least one thread based on the priority of the instructions. The priority indicates the likelihood the instruction will execute in the execution pipeline without stalling. For example, an instruction may have a high priority if it has little or no register dependencies or its data is known to be available; or may have a low priority if it has strong register dependencies or is an uncacheable or synchronized storage space load instruction.
摘要:
A multithreading processor for concurrently executing multiple threads is provided. The processor includes an execution pipeline and a thread scheduler that dispatches instructions of the threads to the execution pipeline. The execution pipeline detects a stalling event caused by a dispatched instruction, and flushes the execution pipeline to enable instructions of other threads to continue executing. The execution pipeline communicates to the scheduler which thread caused the stalling event, and the scheduler stops dispatching instructions for the thread until the stalling condition terminates. In one embodiment, the execution pipeline only flushes the thread including the instruction that caused the event. In one embodiment, the execution pipeline stalls rather than flushing if the thread is the only runnable thread. In one embodiment, the processor includes skid buffers to which the flushed instructions are rolled back so the instruction fetch pipeline need not be flushed, only the execution pipeline.
摘要:
A multithreading processor for concurrently executing multiple threads is provided. The processor includes an execution pipeline and a thread scheduler that dispatches instructions of the threads to the execution pipeline. The execution pipeline execution pipeline is configured for generating a thread context (TC) flush indicator associated with a thread context when one or more instructions of the thread context would stall in the execution pipeline. One or more instructions in the pipeline of the thread context associated with the thread context flush signal can be flushed or nullified.
摘要:
An instruction dispatching apparatus in a multi threading microprocessor that concurrently executes N threads each in one of G groups each having one of P priorities. G round-robin vectors each have N bits corresponding to the threads, each being a 1-bit left-rotated and subsequently sign-extended version of an N-bit vector with a single bit true of the last thread selected for dispatching in the group. Each of N G-input muxes receive a corresponding one of the N bits of each of the round-robin vectors and selects for output one of the inputs specified by the corresponding thread's group. Selection logic selects for dispatching one of the N instructions corresponding to the thread whose dispatch value is greater than or equal to any of the N threads left thereof. Each dispatch value comprises a least-significant bit of the corresponding mux output, a most-significant dispatchable instruction bit, and middle thread group priority bits.
摘要:
A three-tiered TLB architecture in a multithreading processor that concurrently executes multiple instruction threads is provided. A macro-TLB caches address translation information for memory pages for all the threads. A micro-TLB caches the translation information for a subset of the memory pages cached in the macro-TLB. A respective nano-TLB for each of the threads caches translation information only for the respective thread. The nano-TLBs also include replacement information to indicate which entries in the nano-TLB/micro-TLB hold recently used translation information for the respective thread. Based on the replacement information, recently used information is copied to the nano-TLB if evicted from the micro-TLB.
摘要:
A concurrent instruction dispatch apparatus includes a group indicator for each of a plurality of threads that indicates which one of a plurality of groups of the threads the thread belongs to. A group priority indicator for each group indicates an instruction dispatch priority relative to the other groups. Selection logic selects a thread for dispatching an instruction thereof based on the group and group priority indicators. A bifurcated scheduler includes first scheduler logic that issues instructions of the threads to an execution unit, second scheduler logic that enforces a thread scheduling policy, and an interface. A group indicator indicates which group each thread belongs to, a priority for each group, and execution information for each thread. The first scheduler logic issues the instructions based on the group priorities and group indicators, and the second scheduler logic updates the group indicators based on the instruction execution information.
摘要:
A three-tiered TLB architecture in a multithreading processor that concurrently executes multiple instruction threads is provided. A macro-TLB caches address translation information for memory pages for all the threads. A micro-TLB caches the translation information for a subset of the memory pages cached in the macro-TLB. A respective nano-TLB for each of the threads caches translation information only for the respective thread. The nano-TLBs also include replacement information to indicate which entries in the nano-TLB/micro-TLB hold recently used translation information for the respective thread. Based on the replacement information, recently used information is copied to the nano-TLB if evicted from the micro-TLB.
摘要:
A three-tiered TLB architecture in a multithreading processor that concurrently executes multiple instruction threads is provided. A macro-TLB caches address translation information for memory pages for all the threads. A micro-TLB caches the translation information for a subset of the memory pages cached in the macro-TLB. A respective nano-TLB for each of the threads caches translation information only for the respective thread. The nano-TLBs also include replacement information to indicate which entries in the nano-TLB/micro-TLB hold recently used translation information for the respective thread. Based on the replacement information, recently used information is copied to the nano-TLB if evicted from the micro-TLB.
摘要:
An apparatus selects one of N transaction queues from which to transmit a transaction out a port of a switch. A first input value specifies the last-selected queue. Only one of the N bits of the first value corresponding to the last selected queue is true. A second input value specifies which queue is enabled for selection. Each of the N bits of the second value whose corresponding queue is enabled is false. A barrel incrementer 1-bit left-rotatively increments the second value by the first value to generate a sum. Combinational logic generates a third value specifying which queue is selected next. The third value is a Boolean AND of the sum and an inverted version of the second value. Only one of the N bits of the third value corresponding to the next selected one of the queues is true.