摘要:
Various embodiments of methods and systems for storing multiple groups of microcode operations and corresponding control sequences per row of microcode ROM are disclosed. In one embodiment, an integrated circuit may include a microcode ROM coupled to a control sequence logic unit. The microcode ROM may store multiple groups of microcode operations per row. For each group of microcode operations stored in a row, a corresponding control sequence may also be stored in the row. Each group of microcode operations may be included in a microcode routine. The groups of microcode operations stored in a row may be included in the same microcode routine, or some of the groups may be included in different microcode routines.
摘要:
A system and method for transferring data using an early response signal to indicate subsequent transmission of data after a fixed latency, wherein the signal and data are transferred from a first clock domain to a second clock domain using a clock skipping technique. In one embodiment, an early response signal is transmitted by a first device k clock pulses prior to transmission of the data. The receiving device, which is operating at a higher clock rate, receives the early response signal and delays the signal by the number of skipped pulses which will occur in the second clock domain before the occurrence of the kth valid pulse. The second device employs a skip pattern generator to generate a signal indicative of this number of skipped pulses and provides the number to a delay circuit which delays the early response signal for an this number of clock pulses. The delayed early response signal is then output to the appropriate logic to indicate the latency of the subsequent data transfer.
摘要:
For a memory access at a processor, only a subset (less than all) of the ways of a cache associated with a memory address is prepared for access. The subset of ways is selected based on stored information indicating, for each memory access, which corresponding way of the cache was accessed. The subset of ways is selected and preparation of the subset of ways is initiated prior to the final determination as to which individual cache way in the subset is to be accessed.
摘要:
A cache controller configured to speculatively invalidate a cache line may respond to an invalidating request or instruction immediately instead of waiting for error checking to complete. In case the error checking determines that the invalidation is erroneous and thus should not be performed, the cache controller protects the speculatively invalidated cache line from modification until error checking is complete. This way, if the invalidation is later found to be erroneous, the speculative invalidation can be reversed. If error checking completes without detecting any errors, the speculative invalidation becomes non-speculative.
摘要:
A superscalar microprocessor implements a reorder buffer to support out-of-order execution of instructions. The reorder buffer stores speculatively executed instructions until the instructions prior to the speculatively instruction have completed without exception. When an exception, such as a branch misprediction, occurs, the reorder buffer may cancel the instructions in the reorder buffer after the exception and restore the state of the reorder buffer to the state prior to the execution of the exception. Properly restoring the state of the reorder buffer requires instruction status information about the mispredicted branch instruction, which is stored in the reorder buffer. Additionally, if multiple branch mispredictions are detected, the mispredictions must be prioritized to determine the misprediction that occurred earliest in the program order. To reduce the time delay for identifying mispredicted instructions, prioritizing mispredicted instructions, canceling instructions subsequent to the mispredicted instruction and reading status information from the reorder buffer, the availability of an instruction tag, which identifies the instruction being executed, during the execution of the instruction is utilized. The reorder buffer receives the tag of the instruction issued to the functional unit. In parallel with the execution of the instruction, the reorder buffer generates hit masks identifying instructions to be canceled in the event of a mispredicted branch. In parallel, status information from the instruction (or instructions) being executed is selected from the reorder buffer and prioritization masks are generated. Therefore, if a mispredicted branch is detected, the instructions that need to be canceled can be readily identified and the instruction status information is readily available.
摘要:
A method and mechanism for performing shift operations using a shuffle unit. A processor includes a shuffle unit configured to perform shuffle operations responsive to shuffle instructions. The shuffle unit is adapted to support shift operations as well. In response to determining a shuffle instruction is received, selected bits of an immediate value of the shuffle instruction are used to generate byte selects for relocating bytes of a source operand. In response to determining the instruction is a shift instruction, the shuffle unit performs an arithmetic operation on a first and second value, where the first value corresponds to a particular destination byte position, and the second value corresponds to the immediate value. The result of the arithmetic operation comprises a byte select which selects one of the bytes of a source operand for conveyance to the particular destination byte position. In the event a destination byte position should be cleared by the shift operation, the output for the particular destination byte position is forced to zero.
摘要:
A microprocessor including a level two cache memory including asynchronously accessible cache blocks. The microprocessor includes an execution unit coupled to a cache memory subsystem which includes a plurality of storage blocks, each configured to store a plurality of data units. Each of the plurality of storage blocks may be accessed asynchronously. In addition, the cache subsystem includes a plurality of tag units which are coupled to the plurality of storage blocks. Each of the tag units may be configured to store a plurality of tags each including an address tag value which corresponds to a given unit of data stored within the plurality of storage blocks. Each of the plurality of tag units may be accessed synchronously.
摘要:
A processor is described which includes a stride detect table. The stride detect table includes one or more entries, each entry used to track a potential stride pattern. Additionally, each entry includes a confidence counter. The confidence counter may be incremented each time another address in the pattern is detected, and thus may be indicative of the strength of the pattern (e.g., the likelihood of the pattern repeating). At a first threshold of the confidence counter, prefetching of the next address in the pattern (the most recent address plus the stride) may be initiated. At a second, greater threshold, a more aggressive prefetching may be initiated (e.g. the most recent address plus twice the stride). In some implementations, the prefetch mechanism including the stride detect table may replace a prefetch buffer and prefetch logic in the memory controller.
摘要:
A method and mechanism for performing division. A processor includes a divider configured to perform arithmetic division operations. Prior to dividing a dividend by a divisor, the divider manipulates the dividend and divisor to reduce the number of bits considered and the computations required to perform the division. The divisor is normalized by eliminating sign bits. The dividend is prescaled to eliminate one or more sign bits. Prescaling of the dividend may not be precise as sign bits of the dividend may be shifted out as groups of bits, rather than individual bits. Prescaling of the dividend may be adjusted to account for the fact that the divider considers multiple bits of the dividend at a time. Subsequent to prescaling and adjustment, the dividend may be adjusted in dependence upon the normalization of the divisor. Further adjustment may be utilized to maintain a significance relationship between the divisor and dividend. Subsequent to further adjustment, the division operation may be completed.
摘要:
A cache system comprises a plurality of cache banks, a translation look-aside buffer (TLB), and a scheduler. The TLB is used to translate a virtual address (VA) to a physical address (PA). The scheduler, before the VA has been completely translated to the PA, uses a subset of the VA's bits to schedule access to the plurality of cache banks.