摘要:
An apparatus (2) comprises processing circuitry (4) for performing data processing in response to instructions. The processing circuitry (4) supports a cache maintenance instruction (50) specifying a virtual page address (52) identifying a virtual page of a virtual address space. In response to the cache maintenance instruction, the processing circuitry (4) triggers at least one cache (18, 20, 22) to perform a cache maintenance operation on one or more cache lines for which a physical address of the data stored by the cache line is within a physical page that corresponds to the virtual page identified by the virtual page address provided by the cache maintenance instruction.
摘要:
In a data processing system in which the execution unit is implemented to process aligned double word operands, apparatus and an associated method provide for the alignment of a double word operand that is stored across a double word boundary. The two double words each storing a word of the unaligned double word operand are identified and the attributes are compared with the ring number of the associated program. When the comparisons indicate that the two words of the non-aligned double word operand are available to the program, the two double word operands containing non-aligned words of the double word operand, and the two non-aligned words are stored in a register in an aligned orientation for processing by the execution unit.
摘要:
A data processing system includes a prefetch circuit for use with a memory (14). The prefetch circuit includes a storage buffer (204) for receiving a command from the memory (14), a decoding circuit for decoding the command to determine the address of an index register identified in the command for fetching the contents of the index register. The prefetch circuit also includes vertual and real address storage registers (221, 224) for receiving and storing the virtual and real addresses of the command, an adding circuit (236) for adding a predetermined offset to the virtual and real addresses of the command to obtain new virtual and real addresses, a comparison circuit (240) for determining if the new virtual address from the adding circuit (236) has crossed a virtual page boundary, a transfer circuit responsive to the comparison circuit (240) for transferring the real address in the real address storage register (224) to the adding circuit for adding the offset thereto, thereby obtaining a new real address. The fetch circuit then prefetches a command from the memory (14) at the new real address. The storage buffer (204) also includes registers for storing prefetched data and a prefetched index register.
摘要:
In computing environments that use virtual addresses (or other indirectly usable addresses) to access memory, the virtual addresses are translated to absolute addresses (or other directly usable addresses) prior to accessing memory. To facilitate memory access, however, address translation is omitted in certain circumstances, including when the data to be accessed is within the same unit of memory as the instruction accessing the data. In this case, the absolute address of the data is derived from the absolute address of the instruction, thus avoiding address translation for the data. Further, in some circumstances, access checking for the data is also omitted.
摘要:
Address translation for instruction fetching can be obviated for sequences of instruction instances that reside on a same page. Obviating address translation reduces power consumption and increases pipeline efficiency since accessing of an address translation buffer can be avoided. Certain events, such as branch mis-predictions and exceptions, can be designated as page boundary crossing events. In addition, carry over at a particular bit position when computing a branch target or a next instruction instance fetch target can also be designated as a page boundary crossing event. An address translation buffer is accessed to translate an address representation of a first instruction instance. However, until a page boundary crossing event occurs, the address representations of subsequent instruction instances are not translated. Instead, the translated portion of the address representation for the first instruction instance is recycled for the subsequent instruction instances.
摘要:
A fetch section of a processor comprises an instruction cache and a pipeline of several stages for obtaining instructions. Instructions may cross cache line boundaries. The pipeline stages process two addresses to recover a complete boundary crossing instruction. During such processing, if the second piece of the instruction is not in the cache, the fetch with regard to the first line is invalidated and recycled. On this first pass, processing of the address for the second part of the instruction is treated as a pre-fetch request to load instruction data to the cache from higher level memory, without passing any of that data to the later stages of the processor. When the first line address passes through the fetch stages again, the second line address follows in the normal order, and both pieces of the instruction are can be fetched from the cache and combined in the normal manner.
摘要:
In a pipelined processor, a pre-decoder in advance of an instruction cache calculates the branch target address (BTA) of PC-relative and absolute address branch instructions. The pre-decoder compares the BTA with the branch instruction address (BIA) to determine whether the target and instruction are in the same memory page. A branch target same page (BTSP) bit indicating this is written to the cache and associated with the instruction. When the branch is executed and evaluated as taken, a TLB access to check permission attributes for the BTA is suppressed if the BTA is in the same page as the BIA, as indicated by the BTSP bit. This reduces power consumption as the TLB access is suppressed and the BTA/BIA comparison is only performed once, when the branch instruction is first fetched. Additionally, the pre-decoder removes the BTA/BIA comparison from the BTA generation and selection critical path.
摘要:
In a computer system having a number of page partitioned and virtually addressed address spaces, a physically addressed data storage structure and its complementary selection data storage structure is provided with a complementary memory page crossing prediction storage structure, a latch, and a comparator. The memory page crossing prediction storage structure is used to store a number of memory page crossing predictive annotations corresponding to the contents of the data and selection data storage structures. Each memory page crossing predictive annotation predicts whether the current access crosses into a new memory page. The latch is used to successively record a first portion of each accessing physical address translated from a corresponding portion of each accessing virtual address. The recorded first portion of the physical address of the immediately preceding access is used to select data currently being read out of the storage structures, if the memory page crossing predictive annotation currently being read out predicts no memory page crossing. The comparator is used to determine whether the first portions of the physical addresses of the current and immediately preceding accesses are equal, if the first portion of the physical address of the immediately preceding access is used to select data for the current access. Remedial actions including invalidating the selected data and correcting the incorrect memory page crossing predictive annotation are taken, if the two physical address portions are determined to be unequal. As a result, most of the data retrievals are made without having to wait for the first portions of the accessing physical addresses to be translated, thereby improving the performance of retrieving data from the physically addressed data storage structure.
摘要:
A prediction logic device operating in conjunction with a vector processor to predict, before the completion of the translation of the virtual addresses of all of the data elements of a vector, the valid performace of all virtual-address to physical-address translations for the data elements of the vector. The prediction logic device asserts an MMOK signal to a scalar processor when it becomes known that no memory management fault and/or translation buffer miss will occur such that the scalar processor can resume vector instruction issue to the vector processor at the earliest possible time.