摘要:
A system is provided for management of data in cache memories in a multiprocessor environment which allows portions of lines to be valid and exclusive, while other portions are valid, but not exclusive, or invalid. A processor may store into portions of a line under its exclusive control without invalidating copies of the line held in the cache memories of the other processors. The system includes at least two processors, a shared main memory and a system control element, and each processor has a corresponding cache memory, a modified line stack and a sectored line directory. The modified line stack identifies data lines which have been changed since being made resident in cache memory. It also identifies the status of change of each word within those lines. A "shared exclusive" flag in the system control element identifies each line for which portions of the line are under exclusive control of more than one processor. The sectored line directory identifies the control status and change status of individual words within a line flagged as "shared exclusive." If a line is shared exclusive, an entry for that line is recorded in the sectored line directory. For those lines with entries in the sectored line directory, a processor may store into words within its exclusive control, and fetch words within its exclusive or read-only control. Remote processors may fetch words which are held read-only by the local processor, and store into words which are marked invalid in the cache memory of the local processor.
摘要:
A fast queue mechanism is provided which keeps a queue of changes (i.e. store actions) issued by each processor, which queue is accessible by all processors. When any processor issues a store action to a line of memory in the queue, the old data is overwritten with the new data. If the queue does not currently have a corresponding entry, a new entry is activated. Room for the new entry is made by selecting some existing entry, either the oldest or the least recently used, to be removed. An entry that is to be removed is first used to update the line corresponding to it in main memory. After the changes held in the entry to be removed are applied to the old value of the line (from main memory) and the updated value is put back into main memory, the entry in the queue is removed by marking it "empty". When a processor accesses a line of data not in its cache, a cache miss occurs and it is necessary to fetch the line from main memory. Such fetches are monitored by the queue mechanism to see if it is holding changes to the line being fetched. If so, the changes are applied to the line coming from main memory before the line is sent to the requesting processor. After a new entry is made in the queue mechanism, other store actions to the same entry by any processor may occur and usually a number of store actions will occur to the entry before it is removed to make room for another.
摘要:
Methods and apparatus are described for processing branch instructions using a history based branch prediction mechanism (such as a branch history table) in combination with a data dependent branch table (DDBT), where the branch instructions can vary in both outcome and test operand location. The novel methods and apparatus are sensitive to branch mispredictions and to operand addresses used by the DDBT, to identify irrelevant DDBT entries. Irrelevant DDBT entries are identified within the prediction mechanism using state bits which, when set, indicate that: (1) a given entry in the prediction mechanism was updated by the DDBT and (2) subsequent to such update a misprediction occurred making further DDBT updates irrelevant. Once a DDBT entry is determined to be irrelevant, it is prevented from updating the prediction mechanism. The invention also provides methods and apparatus for locating and removing irrelevant entries from the DDBT. The update packet, sent by the DDBT to the history based prediction mechanism, is expanded to include the test operand address actually used by the DDBT. If the state bits indicate the update is irrelevant, then the operand address can be used to locate and delete the offending DDBT entry since the DDBT is organized based on operand addresses. Additionally, the invention provides for inhibiting creation of further DDBT entries when a Branch Wrong Guess event occurs subsequent to a DDBT update to a given prediction mechanism entry.
摘要:
Method and apparatus for correctly predicting an outcome of a branch instruction in a system of the type that includes a Branch History Table (BHT) and branch instructions that implement non-explicit subroutine calls and returns. Entries in the BHT have two additional stage fields including a CALL field to indicate that the branch entry corresponds to a branch that may implement a subroutine call and a PSEUDO field. The PSEUDO field represents linkage information and creates a link between a subroutine entry and a subroutine return. A target address of a successful branch instruction is used to search the BHT. The branch is known to be a subroutine return if a target quadword contains an entry prior to a target halfword that has the CALL field set. The entry with the CALL bit set is thus known to be the corresponding subroutine call, and the entry point to the subroutine is given by the target address stored within the entry. A PSEUDO entry is inserted into the BHT at the location corresponding to the entry point of the subroutine, the PSEUDO entry being designated as such by having the PSEUDO field asserted. The PSEUDO entry contains the address of the returning branch instruction in place of the target address field.
摘要:
A digital computer includes a main and an auxiliary pipeline processor which are configured to concurrently execute contiguous groups of instructions taken from a single instruction sequence. The instructions in a sequence may be divided into groups by using either taken-branch instructions or certain instructions which may change the contents of the general purpose registers as group delimiters. Both methods of grouping the instructions use a branch history table to predict the sequence in which the instructions will be executed.
摘要:
A store through cache environment managed exclusively grants exclusivity on a large granularity basis. A cross-invalidate is realized for all changed lines via a single transmission when exclusivity is released. A dynamic table that operates in conjunction with a directory look-aside table (DLAT) determines a number of pages that can be held exclusive simultaneously. For adequate operating speed, the special table must be either fully associative or at least set associative. Alternatively, the table can be incorporated into the DLAT. Each DLAT entry is also extended to include a set of "resident" bits and a "valid nonresident" bit. When exclusively is released, the set of local change bits is broadcast to all processors. Upon receipt of such broadcast, the appropriate action is to change the "valid nonresident" indication to read-only and to clear residence bits whose corresponding local change bit is set.
摘要:
A cache memory system develops an optimum sequence for transferring data values between a main memory and a line buffer internal to the cache. At the end of a line transfer, the data in the line buffer is written into the cache memory as a block. Following an initial cache miss, the cache memory system monitors the sequence of data requests received for data in the line that is being read in from main memory. If the sequence being used to read in the data causes the processor to wait for a specific data value in the line, a new sequence is generated in which the specific data value is read at an earlier time in the transfer cycle. This sequence is associated with the instruction that caused the first miss and is used for subsequent misses caused by the instruction. If, in the process of handling a first miss related to a specific instruction, a second miss occurs which is caused by the same instruction but which is for data in a different line of memory, the sequence associated with the instruction is marked as an ephemeral miss. Data transferred to the line buffer in response to an ephemeral miss is not stored in the cache memory and limited to that portion of the line accessed within the line buffer.
摘要:
A multi-prediction branch prediction mechanism predicts each conditional branch at least twice, first during the instruction-fetch phase of the pipeline and then again during the decode phase of the pipeline. The mechanism uses at least two different branch prediction mechanisms, each a separate and independent mechanism from the other. A set of rules are used to resolve those instances as to when the predictions differ.
摘要:
A method for addressing data in a cache unit which has a plurality of congruence classes, following a failure which disables one or more of the congruence classes in the cache unit. A plurality of synonym classes are established. A subset of the congruence classes is assigned to each of the synonym classes. Any disabled congruence classes are identified. The synonym class to which the disabled congruence class belongs is identified. An alternate congruence class is selected which belongs to the same synonym class as the disabled congruence class. When a request is received by the cache to store a line of data into the disabled congruence class, the line is stored into the alternate congruence class in response to the request.
摘要:
System and method for predicting a multiplicity of future branches simultaneously (parallel) from an executing program, to enable the simultaneous fetching of multiple disjoint program segments. Additionally, the present invention detects divergence of incorrect branch predictions and provides correction for such divergence without penalty. By predicting an entire sequence of branches in parallel, the present invention removes restrictions that decoding of multiple instructions in a superscalar environment must be limited to a single branch group. As a result, the speed of today's superscalar processors can be significantly increased. The present invention includes three main embodiments: (1) the first embodiment is directed to a simplex multibranch prediction device, that can predict a plurality of branch groups in one cycle and provide early detection of wrong predictions; (2) the second embodiments is directed to a duplex multibranch prediction device that can detect divergence in a predicted stream, and provide redirection (correction) within the stream; and (3) the third embodiment is directed to an n-plex multibranch prediction device, that can predict n multiplicity of branch predictions simultaneously and provide an early detection of wrong predictions as well as correction of wrong predictions.