摘要:
A microprocessor includes an externally accessible port and a serial communication bus connected to the port. An execution pipeline of the processor includes a pipeline satellite circuit coupling the pipeline to the bus. The satellite enables an external agent to provide an instruction directly to the pipeline via the serial bus. A dedicated register and register satellite circuit couple the register to the communication bus. The execution pipeline can access the dedicated register during execution of the instruction. In this manner, the satellite circuits enable the external agent to access architected state. The communication bus enables access to the satellites while a system clock to the processor remains active. In one embodiment, the pipeline satellite accesses the pipeline “downstream” of the decode stage such that the set of instructions that may be “rammed” into the pipeline is not limited to the set of instructions that the decode stage can generate.
摘要:
A real address range check mechanism verifies real addresses generated in a computer system which translates real addresses from effective addresses, some of the effective addresses being real addresses not requiring translation. The system has at least two operating modes. In one mode, the range checking mechanism generates an error signal responsive to detecting a real address outside a predetermined range, and in the other operating mode no error signal is generated. Preferably, the computer system's hardware resources, including real address space, is logically partitioned, partitioning being managed by an ultra-privileged process called a hypervisor. Preferably, the processor supports hardware multithreading, each thread independently capable of being in either hypervisor, supervisor, or problem state, real address range checking error signals being disabled in the hypervisor state.
摘要:
A processor supports logical partitioning of a computer system. Logical partitions isolate the real address spaces of processes executing on different processors and the hardware resources that include processors. However, this multithreaded processor system can dynamically reallocate hardware resources including the processors among logical partitions. An ultra-privileged supervisor process, called a hypervisor, regulates the logical partitions. Preferably, the processor supports hardware multithreading, each thread independently capable of being in either hypervisor, supervisor, or problem state. The processor assigns certain generated addresses to its logical partition, preferably by concatenating certain high order bits from a special register with lower order bits of the generated address. A separate range check mechanism concurrently verifies that these high order effective address bits are in fact 0, and generates an error signal if they are not. In the preferred embodiment, instruction addresses from either active or dormant threads can be pre-fetched in anticipation of execution. In the preferred embodiment, the processor supports different environments which use the hypervisor, supervisor and problem states differently.
摘要:
A processor supports logical partitioning of hardware resources including real address spaces of a computer system. An ultra-privileged supervisor process, called a hypervisor, regulates the logical partitions and can dynamically re-allocate resources. Preferably, the processor supports hardware multithreading, each thread independently capable of being in either hypervisor, supervisor, or problem state, and is capable of entering hypervisor state only upon occurrence of certain pre-defined events. A logical partition identifier is stored in a processor register, and can be altered by the processor only when in hypervisor state. Certain bus communications contain a logical partition identifier tag, and the processor ignores such communications if the tag does not match its own logical partition identifier in its register.
摘要:
A multithreaded processor includes a level one instruction cache shared by all threads. The I-cache is accessed with an instruction unit generated effective address, the I-cache directory containing real page numbers of the corresponding cache lines. A separate line fill sequencer exists for each thread. Preferably, the I-cache is N-way set associative, where N is the number of threads, and includes an effective-to-real address table (ERAT), containing pairs of effective and real page numbers. ERAT entries are accessed by hashing the effective address. The ERAT entry is then compared with the effective address of the desired instruction to verify an ERAT hit. The corresponding real page number is compared with a real page number in the directory array to verify a cache hit. Preferably, the line fill sequencer operates in response to a cache miss, where there is an ERAT hit. In this case, the full real address of the desired instruction can be constructed from the effective address and the ERAT, making it unnecessary to access slower address translation mechanisms for main memory. Because there is a separate line fill sequencer for each thread, threads are independently able to satisfy cache fill requests without waiting for each other. Additionally, because the I-cache index contains real page numbers, cache coherency is simplified. Furthermore, the ERAT avoids the need in many cases to access slower memory translation mechanisms. Finally, the n-way associative nature of the cache reduces thread contention.
摘要:
A method and processor for reducing the fetch time of target instructions of a predicted taken branch instruction. Each entry in a buffer, referred to herein as a “branch target buffer”, may store an address of a branch instruction predicted taken and the instructions beginning at the target address of the branch instruction predicted taken. When an instruction is fetched from the instruction cache, a particular entry in the branch target buffer is indexed using particular bits of the fetched instruction. The address of the branch instruction in the indexed entry is compared with the address of the instruction fetched from the instruction cache. If there is a match, then the instructions beginning at the target address of that branch instruction are dispatched directly behind the branch instruction. In this manner, the fetch time of target instructions of a predicted taken branch instruction is reduced.
摘要:
Disclosed are a method and a system for grouping processor instructions for execution by a processor, where the group of processor instructions includes at least two branch processor instructions. In one or more embodiments, an instruction buffer can decouple an instruction fetch operation from an instruction decode operation by storing fetched processor instructions in the instruction buffer until the fetched processor instructions are ready to be decoded. Group formation can involve removing processor instructions from the instruction buffer and routing the processor instruction to latches that convey the processor instructions to decoders. Processor instructions that are removed from instruction buffer in a single clock cycle can be called a group of processor instructions. In one or more embodiments, the first instruction in the group must be the oldest instruction in the instruction buffer and instructions must be removed from the instruction buffer ordered from oldest to youngest.
摘要:
An information handling system includes a memory, a processor, and an instruction tracking unit. The processor executes program code and, while the program code executes, the instruction tracking unit decodes a multi-purpose no-op instruction within the program code. In turn, the instruction tracking unit sends an interrupt to the processor, which invokes a profiling module to collect and store profiling data in a profiling buffer.
摘要:
Disclosed are a method and a system for grouping processor instructions for execution by a processor, where the group of processor instructions includes at least two branch processor instructions. In one or more embodiments, an instruction buffer can decouple an instruction fetch operation from an instruction decode operation by storing fetched processor instructions in the instruction buffer until the fetched processor instructions are ready to be decoded. Group formation can involve removing processor instructions from the instruction buffer and routing the processor instruction to latches that convey the processor instructions to decoders. Processor instructions that are removed from instruction buffer in a single clock cycle can be called a group of processor instructions. In one or more embodiments, the first instruction in the group must be the oldest instruction in the instruction buffer and instructions must be removed from the instruction buffer ordered from oldest to youngest.
摘要:
A method, and apparatus are provided for preventing livelocks in processor selection of load requests in a multiprocessor (MP) system. On random occasions a selection mechanism is changed for first holding up all requests and then a random selection is made. Then a round robin selection mechanism is used for further requests. A livelock-preventing selection mechanism includes a pair of linear feedback shift registers (LFSRs), each LFSR for generating pseudo random values.