摘要:
An instruction cache employing a cache holding register is provided. When a cache line of instruction bytes is fetched from main memory, the instruction bytes are temporarily stored into the cache holding register as they are received from main memory. The instruction bytes are predecoded as they are received from the main memory. If a predicted-taken branch instruction is encountered, the instruction fetch mechanism within the instruction cache begins fetching instructions from the target instruction path. This fetching may be initiated prior to receiving the complete cache line containing the predicted-taken branch instruction. As long as instruction fetches from the target instruction path continue to hit in the instruction cache, these instructions may be fetched and dispatched into a microprocessor employing the instruction cache. The remaining portion of the cache line of instruction bytes containing the predicted-taken branch instruction is received by the cache holding register. In order to reduce the number of ports employed upon the instruction bytes storage used to store cache lines of instructions, the cache holding register retains the cache line until an idle cycle occurs in the instruction bytes storage. The same port ordinarily used for fetching instructions is then used to store the cache line into the instruction bytes storage. In one embodiment, the instruction cache prefetches a succeeding cache line to the cache line which misses. A second cache holding register is employed for storing the prefetched cache line.
摘要:
An instruction cache employing a cache holding register is provided. When a cache line of instruction bytes is fetched from main memory, the instruction bytes are temporarily stored into the cache holding register as they are received from main memory. The instruction bytes are predecoded as they are received from the main memory. If a predicted-taken branch instruction is encountered, the instruction fetch mechanism within the instruction cache begins fetching instructions from the target instruction path. This fetching may be initiated prior to receiving the complete cache line containing the predicted-taken branch instruction. As long as instruction fetches from the target instruction path continue to hit in the instruction cache, these instructions may be fetched and dispatched into a microprocessor employing the instruction cache. The remaining portion of the cache line of instruction bytes containing the predicted-taken branch instruction is received by the cache holding register. In order to reduce the number of ports employed upon the instruction bytes storage used to store cache lines of instructions, the cache holding register retains the cache line until an idle cycle occurs in the instruction bytes storage. The same port ordinarily used for fetching instructions is then used to store the cache line into the instruction bytes storage. In one embodiment, the instruction cache prefetches a succeeding cache line to the cache line which misses. A second cache holding register is employed for storing the prefetched cache line.
摘要:
A predecode unit is configured to predecode variable byte-length instructions prior to their storage within an instruction cache of a superscalar microprocessor. The predecode unit generates three predecode bits associated with each byte of instruction code: a "start" bit, an "end" bit, and a "functional" bit. The start bit is set if the associated byte is the first byte of the instruction. Similarly, the end bit is set if the byte is the last byte of the instruction. The functional bits convey information regarding the location of an opcode byte for a particular instruction as well as an indication of whether the instruction can be decoded directly by the decode logic of the processor or whether the instruction is executed by invoking a microcode procedure controlled by an MROM unit. For fast path instructions, the functional bit is set for each prefix byte included in the instruction, and cleared for other bytes. For MROM instructions, the functional bit is cleared for each prefix byte and is set for other bytes. The type of instruction (either fast path or MROM) may thus be determined by examining the functional bit corresponding to the end byte of the instruction. If that functional bit is clear, the instruction is a fast path instruction. Conversely, if that functional bit is set, the instruction is an NMOM instruction. After an MROM instruction is identified, the functional bits for the instruction may be inverted. Subsequently, the opcode for both fast path and MROM instructions may readily be located (by the alignment logic) by determining the first byte within the instruction that has a cleared functional bit.
摘要:
A branch prediction unit stores a set of branch prediction history bits and branch selectors corresponding to each of a group of contiguous instruction bytes stored in an instruction cache. While only one bit is used to represent branch prediction history, three distinct states are represented in conjunction with the absence of a branch prediction. This provides for the storage of fewer bits, while maintaining a high degree of branch prediction accuracy. Each branch selector identifies the branch prediction to be selected if a fetch address corresponding to that branch selector is presented. In order to minimize the number of branch selectors stored for a group of contiguous instruction bytes, the group is divided into multiple byte ranges. The largest byte range may include a number of bytes comprising the shortest branch instruction in the instruction set (exclusive of the return instruction). For example, the shortest branch instruction may be two bytes in one embodiment. Therefore, the largest byte range is two bytes in the example. Since the branch selectors as a group change value (i.e. indicate a different branch instruction) only at the end byte of a predicted-taken branch instruction, fewer branch selectors may be stored than the number of bytes within the group.
摘要:
A branch prediction unit includes a branch prediction entry corresponding to a group of contiguous instruction bytes. The branch prediction entry stores branch predictions corresponding to branch instructions within the group of contiguous instruction bytes. Additionally, the branch prediction entry stores a set of branch selectors corresponding to the group of contiguous instruction bytes. The branch selectors identify which branch prediction is to be selected if the corresponding byte (or bytes) is selected by the offset portion of the fetch address. Still further, a predicted branch selector is stored. The predicted branch selector is used to select a branch prediction for forming the fetch address. In parallel, a selected branch selector is selected from the set of branch selectors. The predicted branch selector is verified using the selected branch selector. If the selected branch selector and the predicted branch selector mismatch, the correct branch prediction is generated and the predicted branch selector is updated to indicate the selected branch selector.
摘要:
A microprocessor stores cache-line-related data (e.g. branch predictions or predecode data, in the illustrated embodiments) in a storage which includes fewer storage locations than the number of cache lines in the instruction cache. Each storage location in the storage is mappable to multiple cache lines, any one of which can be associated with the data stored in the storage location. The storage may thereby be smaller than a storage which provides an equal number of storage locations as the number of cache lines in the instruction cache. Access time to the storage may be reduced, therefore providing for a higher frequency implementation. Still further, semiconductor substrate area occupied by the storage may be decreased. In one embodiment, the storage is indexed by a subset of the index bits used to index the instruction cache. The subset comprises the least significant bits of the cache index. In other words, the cache lines which share a particular storage location within the storage differ in the most significant cache index bits. Therefore, code which exhibits spatial locality may experience little conflict for the storage locations.
摘要:
A memory including first storage circuits for storing first values and second storages circuit for storing second values is provided. The first value may be retired branch prediction information, while the second value may be speculative branch prediction information. The speculative branch prediction information is updated when the corresponding instructions are fetched, and the retired branch prediction value is updated when the corresponding branch instruction is retired. The speculative branch prediction information is used to form branch predictions. Therefore, the speculatively fetched and executed branches influence subsequent branch predictions. Upon detection of a mispredicted branch or an instruction which causes an exception, the speculative branch prediction information is updated to the corresponding retired branch prediction information. An update circuit is coupled between the first and second storage circuits for transmitting the updated information upon assertion of a control signal. The control signal may be asserted to cause the update of each speculative branch prediction by the corresponding retired branch prediction. The updates occur substantially simultaneously, restoring any corruption to speculative branch predictions due to speculatively fetched branch instructions which were flushed from the instruction processing pipeline. Although discussed herein in terms of a branch prediction array, the memory may be adapted to many other applications.
摘要:
An apparatus including circuitry configured to detect and correct an ECC error in a non-targeted portion of a load access to a first data in a memory. An ECC error check circuit is configured to convey a first indication in response to detecting an error in a non-targeted first portion of the first data. A microcode unit is coupled to receive the first indication that the ECC check circuit has detected the ECC error and in response to the indication dispatch a first microcode routine stored by the microcode unit. The first microcode routine includes instructions which, when executed, correct the ECC error in the first portion. Correction of the error in the first portion does not include cancellation of data corresponding to the load access.
摘要:
A system for estimating an amount of soot in an exhaust particulate filter includes a delta P soot load estimate generator configured to generate a first soot load estimate as a function of a pressure drop and a mass flow of exhaust. The system further includes a model estimate generator configured to generate a second soot load estimate as a function of a modeled engine performance. A trust factor generator is configured to determine a trust factor signal as a function of at least one engine operating characteristic, and a decision generator is configured to determine whether to use the first soot load estimate or the second soot load estimate as a function of the trust factor signal.
摘要:
A system for estimating an amount of soot in an exhaust particulate filter includes a delta P soot load estimate generator configured to generate a first soot load estimate as a function of a pressure drop and a mass flow of exhaust. The system further includes a model estimate generator configured to generate a second soot load estimate as a function of a modeled engine performance. A trust factor generator is configured to determine a trust factor signal as a function of at least one engine operating characteristic, and a decision generator is configured to determine whether to use the first soot load estimate or the second soot load estimate as a function of the trust factor signal.