摘要:
Monitoring apparatus is provided to allow out-of-sequence fetching of operands while preserving the appearance of in-sequence fetching to the processor of a computer. The key elements include a stack (119) of N entries holding the addresses of the last M, where M is less than or equal to N, out-of-sequence fetches. A comparator (103) is provided for comparing addresses in the stack with a test address. This test address is supplied via an OR gate (107) as either store addresses or cross-invalidate addresses, the latter being for a multiprocessor system. The addresses in the stack that compare with the test address are set as invalid. In addition, all addresses in the stack are set as invalid on the occurrence of a cache miss or serializing instruction. Finally, a select and check entry function (113) associates an address in the stack with the instruction it represents and deletes the address from the stack when the instruction is handled in its proper sequence.
摘要:
A digital computer includes a main and an auxiliary pipeline processor which are configured to concurrently execute contiguous groups of instructions taken from a single instruction sequence. The instructions in a sequence may be divided into groups by using either taken-branch instructions or certain instructions which may change the contents of the general purpose registers as group delimiters. Both methods of grouping the instructions use a branch history table to predict the sequence in which the instructions will be executed.
摘要:
In an approach to reducing delays resulting from resolution of conditional branch instructions, such instructions are pre-executed in a coprocessor which precedes a pipeline processor and prepared an instruction stream for input to the pipeline processor. Because of this pre-execution, the input instruction stream has fewer conditional branches for the pipeline processor to resolve. Also, the coprocessor may handle address generation interlock situations which also cause execution delays in the pipeline processor.
摘要:
Method and apparatus for correctly predicting an outcome of a branch instruction in a system of the type that includes a Branch History Table (BHT) and branch instructions that implement non-explicit subroutine calls and returns. Entries in the BHT have two additional stage fields including a CALL field to indicate that the branch entry corresponds to a branch that may implement a subroutine call and a PSEUDO field. The PSEUDO field represents linkage information and creates a link between a subroutine entry and a subroutine return. A target address of a successful branch instruction is used to search the BHT. The branch is known to be a subroutine return if a target quadword contains an entry prior to a target halfword that has the CALL field set. The entry with the CALL bit set is thus known to be the corresponding subroutine call, and the entry point to the subroutine is given by the target address stored within the entry. A PSEUDO entry is inserted into the BHT at the location corresponding to the entry point of the subroutine, the PSEUDO entry being designated as such by having the PSEUDO field asserted. The PSEUDO entry contains the address of the returning branch instruction in place of the target address field.
摘要:
A data-dependent branch table is a mechanism that is sensitive to operands that will be tested in order to determine branch action outcomes. The data dependent branch table operates in conjunction with a branch history table to anticipate those instances where the branch history table will make an erroneous prediction, and corrects the branch history table prior to the time that the actual prediction is made.
摘要:
A system is provided for management of data in cache memories in a multiprocessor environment which allows portions of lines to be valid and exclusive, while other portions are valid, but not exclusive, or invalid. A processor may store into portions of a line under its exclusive control without invalidating copies of the line held in the cache memories of the other processors. The system includes at least two processors, a shared main memory and a system control element, and each processor has a corresponding cache memory, a modified line stack and a sectored line directory. The modified line stack identifies data lines which have been changed since being made resident in cache memory. It also identifies the status of change of each word within those lines. A "shared exclusive" flag in the system control element identifies each line for which portions of the line are under exclusive control of more than one processor. The sectored line directory identifies the control status and change status of individual words within a line flagged as "shared exclusive." If a line is shared exclusive, an entry for that line is recorded in the sectored line directory. For those lines with entries in the sectored line directory, a processor may store into words within its exclusive control, and fetch words within its exclusive or read-only control. Remote processors may fetch words which are held read-only by the local processor, and store into words which are marked invalid in the cache memory of the local processor.
摘要:
Apparatus for fetching instructions in a computing system. A broadband branch history table is organized by cache line. The broadband branch history table determines from the history of branches the next cache line to be referenced and uses that information for prefetching lines into the cache.
摘要:
An information processing unit and storage system comprising at least one low speed, high capacity main memory having relatively long access time and including a plurality of data pages stored therein and at least one high speed, low capacity Cache memory means having a relatively short access time and adapted to store a predetermined plurality of subsets of the information stored in said main memory data pages. Instruction decoding means are located in the communication channel between the main Memory and the Cache which are operative to at least partially decode instructions being transferred from main Memory to Cache. The at least partial decoding comprising expanding the instruction format from that utilized in the main Memory storage to one more readily executable by the processor prior to storing said instructions in the Cache. Said decoding means includes a logic circuit means for determining whether a given instruction is susceptible of partial decoding and means for determining that a particular instruction has already been partially decoded (i.e., after a first accessing of said instruction by the processor from Cache).In the preferred embodiment the assumption is made that the system utilizes separate Cache storage means for data and instructions respectively whereby only instructions being transferred from main Memory to Cache will pass through said decoding means.
摘要:
Each of a plurality of stored pointers identifies and accesses one of a plurality of hardware registers in a central processing unit (CPU). Each pointer is associated with and corresponds to one of a limited number of general purpose registers addressable by various fields in a program instruction of the data processing system. At least one program instruction calls for transfer of data from a particular main storage location to a general purpose register (GPR) identified by a field in the program instruction. The GPR identified as the destination for the data is renamed by assigning a pointer value to provide access to one of the plurality of associated hardware registers. A subsequent load instruction involving the same particular main storage location determines if the data from the previous load instruction is still stored in one of the hardware registers and determines the associated pointer value. The data in the hardware register is made immediately available to the CPU before completion of the access to main storage. The pointer value is associated with, and made to correspond to the destination GPR of the subsequent load instruction. Other instructions which require access to instruction addressable GPR's cause access to the corresponding pointer value to provide access to the corresponding hardware register for purposes of data processing.
摘要:
An incremental method is described for distributing the instructions of an execution sequence among a plurality of processing elements for execution in parallel. The distribution is based upon anticipated availability times of the needed input values for each instruction as well as the anticipated availability times of each processing element for handling each instruction. A self-parallelizing computer system and method are also described for asynchronously processing the distributed instructions in two modes of execution on a set of processing elements which communicate with each other.