摘要:
A cache system provides for accessing set associative caches with no increase in critical path delay, for reducing the latency penalty for cache accesses, for reducing snoop busy time, and for responding to MRU misses and cache misses. A multiway cache includes a single array partitioned into a plurality of cache slots and a directory, both directory and cache slots connected to the same data bus. A first cache slot is selected and accessed; and then corresponding data is accessed from alternate slots while searching said directory, thereby reducing the latency penalty for cache access.
摘要:
A computer implemented method, a processor chip, a data processing system, and computer program product in a data processing system process information in a store cache of a data processing system. The store cache receives a first entry that includes a first address indicating a first segment of a cache line. The store cache then receives a second entry including a second address indicating a second segment of the cache line. Responsive to the first segment not being equal to the second segment, the first entry is chained to the second entry.
摘要:
A processor that efficiently obtains target path instructions in the presence of tight program loops includes at least one execution unit for executing instructions and instruction sequencing logic that supplies instructions to the at least one execution unit for execution. The instruction sequencing logic includes an instruction fetch buffer and a branch prediction unit including a branch target cache. In response to prediction of a branch instruction as taken, the branch target cache causes multiple copies of a target instruction group to be loaded into the instruction fetch buffer under the assumption that the branch instruction is a member of the target instruction group. Thereafter, the branch target cache causes all but one of the multiple copies to be canceled from the instruction fetch buffer prior to dispatch if the branch instruction does not belong to the target instruction group. Thus, the branch target cache can meet the instruction fetch cycle time of the processor even for the worst case condition in which the branch instruction is within the target instruction group.
摘要:
Size exception detection hardware for use with a digital data processor arithmetic unit for providing high-speed detection of lost data which results from storing an arithmetic result in a destination which is smaller than one or both of the source operands. In response to data processing machine instructions, the arithmetic unit performs arithmetic operations on variable length operands and sends the arithmetic results to variable length destinations. The operand and destination lengths are specified by length fields in the machine instruction. The destination length is specified independently of at least one of the operand lengths and hence may be less than such operand length. The size exception detection hardware looks at both the output field of the arithmetic unit and the destination length field in the machine instruction and generates a size exception program interrupt signal when the part of the arithmetic unit output field located outside of the destination length contains significant data. The size exception interrupt is generated during the same machine control cycle during which the arithmetic unit performs the arithmetic operation which gives rise to the size exception.
摘要:
A computer implemented method, a processor chip, a data processing system, and computer program product in a data processing system process information in a store cache of a data processing system. The store cache receives a first entry that includes a first address indicating a first segment of a cache line. The store cache then receives a second entry including a second address indicating a second segment of the cache line. Responsive to the first segment not being equal to the second segment, the first entry is chained to the second entry.
摘要:
A cache system provides for accessing set associative caches with no increase in critical path delay, for reducing the latency penalty for cache accesses, for reducing snoop busy time, and for responding to MRU misses and cache misses. The cache array is accessed by multiplexing two most-recently-used (MRU) arrays which are addressed and accessed substantially in parallel with effective address generation, the outputs of which MRU arrays are generated, one by assuming a carryin of zero, and the other by assuming a carryin of one to the least significant bit of the portion of the effective addressed used to access the MRU arrays. The hit rate in the MRU array is improved by hashing within an adder the adder's input operands with predetermined additional operand bits.
摘要:
A certain class of computer has been previously described which has improved performance through the analysis of instructions comprising the computer's control program and appending control information to the instructions in the form of tags. One such computer analyzes instruction cache lines as they are loaded into the cache to create the tags. A disadvantage of that design is the inability to create control information for portions of the cache line whose control tags depend on instructions in another cache line as well as the line being loaded. A method and apparatus is described herein which facilitates creation of control tags based on instructions which reside in different cache lines. The method permits a more complete analysis to be performed, thereby improving processor performance.