摘要:
A data processing system comprised of a memory, a translation lookaside buffer (TLB) providing access to the memory, and an instruction cache connected to the memory. A two entry translation write buffer (TWB) has a first entry that is a first logical register and an associated first physical address register and a second entry that is a second logical register and an associated second physical address register. A physical address bus is connected to the TWB and a logical address bus is connected to the TLB and to the TWB, the logical address bus presenting an instruction pointer to the TLB and to the TWB. The instruction pointer is comprised of logical address bits including upper order bits, lower order bits, and a single bit having a first value or a second value. The single bit provides for translation of even-number pages for which the single bit has the first value and for odd-number pages for which the single bit has the second value. The upper order bits of the logical address are compared with the stored address values in the first and second logical registers in the TWB resulting in a first hit signal with respect to the first logical register or a second hit signal with respect to the second logical register. The first logical register is selected if the single bit has the first value and the second logical register is selected if the single bit has the second value. The physical address bus is driven with the corresponding first or second physical address associated with a selected first or second logical register upon a condition that the upper order bits of the logical address equal a stored value in the first or second logical register in the TWB.
摘要:
An instruction fetch unit in which an early instruction fetch is initiated to access a main memory simultaneously with checking a cache for the desired instruction. On a slow path to main memory is a large main translation lookaside buffer (TLB) that holds address translations. On a fast path is a smaller translation write buffer (TWB), a mini-TLB, that holds recently used address translations. A guess fetch access in initiated by presenting an address to the main memory in parallel with presenting the address to the cache. The address is compared with the contents of the TWB for a hit and with the contents of the cache for a hit. The guess access is allowed to proceed upon the condition that there is a hit in the TWB (the TWB is able to translate the logical address into a physical address) and a miss in the I-cache (the data are not available in the I-cache and hence the guess access of main memory is necessary to get the data). The guess access is canceled upon the condition that there is either a miss in the TWB (the TWB is unable to translate the logical address into a physical address) or a hit in the I-cache (the data are available in the I-cache and hence the guess access of main memory is not necessary). In this case a fetch access is reissued on the "slow" path that goes through the large main TLB.
摘要:
In one embodiment, the present invention includes a method for controlling redundant execution such that if an exceptional event occurs, the redundant execution is stopped, non-redundant execution is performed in one of the threads until the exceptional event has been-resolved, after which a state of the threads is synchronized, and redundant execution is continued. Other embodiments are described and claimed.
摘要:
A technology for implementing a method for a machine check architecture environment. A method of the disclosure includes obtaining an occurrence of an error. The occurrence of the error causes a non-microcoded processing device to enter an error monitoring state. The method further processes the error using a dedicated memory portion for the error monitoring state while the non-microcoded processing device is in the error monitoring state. The error monitoring state is dedicated to error processing. The method further determines information associated with the error. The information associated with the error is in a predefined format.
摘要:
A system and method are described for integrating a memory and storage hierarchy including a non-volatile memory tier within a computer system. In one embodiment, PCMS memory devices are used as one tier in the hierarchy, sometimes referred to as “far memory.” Higher performance memory devices such as DRAM placed in front of the far memory and are used to mask some of the performance limitations of the far memory. These higher performance memory devices are referred to as “near memory.” In one embodiment, the “near memory” is configured to operate in a plurality of different modes of operation including (but not limited to) a first mode in which the near memory operates as a memory cache for the far memory and a second mode in which the near memory is allocated a first address range of a system address space with the far memory being allocated a second address range of the system address space, wherein the first range and second range represent the entire system address space.
摘要:
A processor including a first execution core section clocked to perform execution operations at a first clock frequency, and a second execution core section clocked to perform execution operations at a second clock frequency which is different than the first clock frequency. The second execution core section runs faster and includes a data cache and critical ALU functions, while the first execution core section includes latency-tolerant functions such as instruction fetch and decode units and non-critical ALU functions. The processor may further include an I/O ring which may be still slower than the first execution core section. Optionally, the first execution core section may include a third execution core section whose clock rate is between that of the first and second execution core sections. Clock multipliers/dividers may be used between the various sections to derive their clocks from a single source, such as the I/O clock.
摘要:
A processor including a first execution core section clocked to perform execution operations at a first clock frequency, and a second execution core section clocked to perform execution operations at a second clock frequency which is different than the first clock frequency. The second execution core section runs faster and includes a data cache and critical ALU functions, while the first execution core section includes latency-tolerant functions such as instruction fetch and decode units and non-critical ALU functions. The processor may further include an I/O ring which may be still slower than the first execution core section. Optionally, the first execution core section may include a third execution core section whose clock rate is between that of the first and second execution core sections. Clock multipliers/dividers may be used between the various sections to derive their clocks from a single source, such as the I/O clock.
摘要:
A redundant multithreading processor is presented. In one embodiment, the processor performs execution of a thread and its duplicate thread in parallel and determines, when in a redundant multithreading mode, whether or not to synchronize an operation of the thread and an operation of the duplicate thread.
摘要:
In one embodiment, the present invention includes a method for controlling redundant execution such that if an exceptional event occurs, the redundant execution is stopped, non-redundant execution is performed in one of the threads until the exceptional event has been-resolved, after which a state of the threads is synchronized, and redundant execution is continued. Other embodiments are described and claimed.
摘要:
A processor including a first execution core section clocked to perform execution operations at a first clock frequency, and a second execution core section clocked to perform execution operations at a second clock frequency which is different than the first clock frequency. The second execution core section runs faster and includes a data cache and critical ALU functions, while the first execution core section includes latency-tolerant functions such as instruction fetch and decode units and non-critical ALU functions. The processor may further include an I/O ring which may be still slower than the first execution core section optionally, the first execution core section may include a third execution core section whose clock rate is between that of the first and second execution core sections. Clock multipliers/dividers may be used between the various sections to derive their clocks from a single source, such as the I/O clock.