摘要:
There is provided a mechanism for propagating exception conditions in a computer system when instructions are subject to exception conditions. The apparatus includes a set of data registers for storing data manipulated by the instructions of the computer system, and a set of state registers for storing speculative states of data manipulated by the instructions, there being one state register associated with each data register. Furthermore, the apparatus includes a logic circuit, coupled to the set of state registers, for propagating the states from a source one of the state registers to a destination one of the state registers, if data stored in an associated source one of the data registers are used as a source for an associated destination one of data registers, and if data stored in the source data register were manipulated by a particular instruction subject to an exception condition.
摘要:
An apparatus for enforcing that selected instructions are executed in a correct order, comprising a first content addressable memory for storing load addresses of data read from the memory by the selected instructions. The first content addressable memory comparing the store addresses with the load addresses of data to be written to the memory. The first content addressable memory generating a first signal, if one of the load addresses is identical to a subsequently compared one of the store addresses. The apparatus further including a second content addressable memory for storing and comparing states of the data read and written by the selected instructions. The second content addressable memory generating a second signal, if one of the stored states is identical to one of said compared states. The stored states including a program counter to repeat the execution of the selected instructions upon detecting the first and second signals.
摘要:
A mechanism for executing computer instructions in parallel includes a compiler for generating and grouping instructions into a plurality of sets of instructions to be executed in parallel, each set having a unique identification. A computer system having a real state and a speculative state executes the sets in parallel, the computer system executing a particular set of instructions in the speculative state if the instructions of the particular set have dependencies which can not be resolved until the instructions are actually executed. The computer system generates speculative data while executing instructions in the speculative state. Logic circuits are provided to detect any exception conditions which occur while executing the particular set in the speculative state. If the particular set is subject to an exception condition, the instructions of the set are re-executed to resolve the exception condition, and to incorporate the speculative data in the real state of the computer system.
摘要:
A compiler groups instructions into sets. The sets of instructions are related by data and control dependencies which are unresolvable by the compiler. Sets of instructions having unresolved dependencies are executed in a speculative state of the computer system under the assumption that an exception condition will not occur. However, if an exception condition does occur while executing a set of instructions in the speculative state, that exception condition is detected and the set of instructions is re-executed in a real state of the computer system to resolve the exception condition.
摘要:
A method and apparatus including means for storing an executable file which includes a group of bits which define functional operations and cycle bits associated with each functional operation and means for completing a variable number of the functional operations in parallel during a single execution cycle in accordance with a state of the associated cycle bit. The method and apparatus eliminates the need for complex data dependency checking hardware and allows a minimum amount of control logic to complete execution of executable files. The method and apparatus further minimizes the necessity of adding null operations (NOPs) to executable files which reduces the amount of storage space necessary to store the executable files and allows executable files to be used on multiple hardware implementations and for register values to be used for multiple purposes during single execution cycles.
摘要:
A hierarchical cache memory includes a high-speed primary cache memory and a lower speed secondary cache memory of greater storage capacity than the primary cache memory. To manage a huge number of data lines interconnecting the primary and secondary cache memories, the hierarchical cache memory is integrated on a plurality of integrated circuits which include all of the interconnecting data lines. Each integrated circuit includes a primary memory and a secondary memory for storing and retrieving data transferred over a first data input line and a first data output line that link the primary memory to a central processing unit. At any given time, a multi-bit word is addressed in the secondary memory, and a corresponding multi-bit word is addressed in the primary memory. The primary and secondary memories are interconnected by a first multi-line bus for transferring a multi-bit word read from the secondary memory to the primary memory, and by a second multi-line bus for transferring a multi-bit word read from the primary memory to the secondary memory. The secondary memory is linked to a main memory by a second data output line and a second data input line for sequential transmission of bits to exchange multi-bit words during a writeback and refill operation. In a preferred embodiment, data inputs of the primary memory and the secondary memory are wired in parallel to a serial-parallel shift register that is used as a common write buffer.
摘要:
A technology for implementing a method for distributed memory operations. A method of the disclosure includes obtaining distributed channel information for an algorithm to be executed by a plurality of spatially distributed processing elements. For each distributed channel in the distributed channel information, the method further associates one or more of the plurality of spatially distributed processing elements with the distributed channel based on the algorithm.
摘要:
An apparatus may comprise a cache file having a plurality of cache lines and a hit predictor. The hit predictor may contain a table of counter values indexed with signatures that are associated with the plurality of cache lines. The apparatus may fill cache lines into the cache file with either low or high priority. Low priority lines may be chosen to be replaced by a replacement algorithm before high priority lines. In this way, the cache naturally may contain more high priority lines than low priority ones. This priority filling process may improve the performance of most replacement schemes including the best known schemes which are already doing better than LRU.
摘要:
An apparatus and method for improving cache performance in a computer system having a multi-level cache hierarchy. For example, one embodiment of a method comprises: selecting a first line in a cache at level N for potential eviction; querying a cache at level M in the hierarchy to determine whether the first cache line is resident in the cache at level M, wherein M
摘要:
A multi-threaded processor provides for efficient flow-control from a pool of un-executed stores in an instruction queue to a store queue. The processor also includes similar capabilities with respect to load instructions. The processor includes logic organized into a plurality of thread processing units (“TPUs”) and allocation logic that monitors each TPUs demand for entries in the store queue. Demand is determined by subtracting an adjustable threshold value from the most recently assigned store identifier value. If the difference between the most recently assigned instruction identifier for a TPU and the TPU's threshold is non-zero, then it is determined that the TPU has demand for at least one entry in the store queue. The allocation logic includes arbitration logic that determines which one of a plurality of TPUs with store queue demand should be allocated a free entry in the store queue.