摘要:
A digital computer system is described which is capable of processing two or more computer instructions in parallel and which has the capability of generating compounding tag information for those instructions, the compounding tag information being associated with instructions for the purpose of indicating groups of instructions which are to be concurrently executed. A compounding tag has a value which indicates the size of the group of instructions which are to be concurrently executed. The computer system includes a hierarchially-arranged memory which provides instructions to a CPU for execution. The instructions are compounded in the memory, and provision is made in the memory for storage of their compounding tags. In the event of modification of an instruction in memory, the invention provides for reduction of the value of the compounding tags for the modified instruction and instructions which are capable of being compounded with the modified instruction or for generation of new tag values for the modified instruction and instructions which are adjacent it in memory.
摘要:
A dynamic multiple instruction stream, multiple data, multiple pipeline (MIMD) apparatus simultaneously executes more than one instruction associated with a multiple number of instruction streams utilizing multiple data associated with the multiple number of instruction streams in a multiple number of pipeline processors. Since instructions associated with a multiple number of instruction streams are being executed simultaneously by a multiple number of pipeline processors, a tracking mechanism is needed for keeping track of the pipe in which each instruction is executing. As a result, a dynamic history table maintains a record of the pipeline processor number in which each incoming instruction is executing, and other characteristics of the instruction. When a particular instruction is received, it is decoded and its type is determined. Each pipeline processor handles a certain category of instructions; the particular instruction is transmitted to the pipeline processor having its corresponding category. However, before transmission, the pipeline processor is checked for completion of its oldest instruction by consulting the dynamic history table. If the table indicates that the oldest instruction in the pipeline processor should complete, execution of the oldest instruction in such processor completes, leaving room for insertion of the particular instruction therein for execution. When the particular instruction is transmitted to its associated pipeline processor, information including the pipe number is stored in the dynamic history table for future reference.
摘要:
Parity for every byte of the sum produced by addition of two operands is predicted based upon segmentation of each sum byte into three groups of adjacent bits, which leads to Boolean minterm circuitry employing a minimum of exclusive-OR gates.
摘要:
An apparatus for branch prediction for computer instructions predicts the outcome of an executing branch instruction in response to instruction operands Q, R, and B. The apparatus includes combinatorial logic for predicting a first branch condition, ((Q+R)-B) > 0, or a second branch condition ((Q+R)-B) ≦ 0.
摘要:
A multi-bit overlapped scanning multiplication system assembles modified partial products in a reduced, non-rectangular banded matrix. The rows of the matrix except for the first and last, are extended with bands of encoded extensions of limited length at the right and left ends of the partial product terms. The width of the significant bits of each partial product term is equal to q-1+S-2, where q is the width of the significant bits plus sign of the multiplicand and S is the number of bits which are overlapped scanned. Each partial product term is shifted S-1 bits from adjacent terms and is banded by encoded extensions to the terms. S-1 bits of encode are placed to the right of every terms except the last, the encode being based on the sign of the next partial product term; and S-1 bits of encoded sign extension are placed to the left of every term except the first, which has no sign extension, and last, to the left of which is placed an S bit encode. The bits of negative partial product terms are inverted, and a "hot 1" is encoded in the right extension in the previous row. The first bit of the multiplier is forced to zero so that the first partial product term is always positive or zero. Carry save adder trees are used to reduce each column of the matrix to two terms. When inputs to a carry save adder are known, the logic of the carry save adder is simplified to save chip space.
摘要:
A general massively parallel computer architecture supporting neural networks is developed utilizing a novel method of separating a triangular array containing N processing elements on each edge into multiple smaller triangular arrays, each of dimension X and each representing a common building block processor group chip, that can be interconnected for various size parallel processing implementations. The group chips are interconnected by a unique switching tree mechanism that maintains the complete connectivity capability and functionality possessed by the original triangular array of dimension N. A partitioning approach is presented first, where for a given size K and X, and K is divisible by X, it is proven that a triangular array containing K processor elements located on each edge of an equilateral triangular array can be partitioned into K/X triangular arrays of dimension X and K(K-X) / 2X ² square processor arrays of dimension X. An algorithm is presented next which partitions a square array into two triangular arrays, each of dimension X. Assuming K=N and the chosen technology supports the placement of a triangular processor group chip of dimension X on a single chip, the final scalable massively parallel computing structure for N root tree processors utilizes N ²/ X ² triangular processor group chips. Examples of using the partitioning methodology to create the scalable organization of processor elements are presented. Following these examples, an interconnection mechanism is developed which is shown to preserve the functionality of the original triangular array of dimension N in the implemented structure constructed of multiple triangular arrays of dimension X. Examples of the interconnection mechanism for two scaled neural network emulation massively parallel computers utilizing the same size X processor group chip are presented. Finally, an alternative scaling mechanism and implementation considerations for the interconnection mechanisms are discussed.
摘要:
The neural computing paradigm is characterized as a dynamic and highly parallel computationally intensive system typically consisting of input weight multiplications, product summation, neural state calculations, and complete connectivity among the neurons. Herein is described neural network architecture called SNAP which uses a unique intercommunication scheme within an array structure that provides high performance for completely connected network models such as the Hopfield model. SNAP's packaging and expansion capabilities are addressed, demonstrating SNAP's scalability to larger networks.
摘要:
A massively parallel processor apparatus having an instruction set architecture for each of the N ² the PEs of the structure. The apparatus which we prefer will have a PE structure consisting of PEs that contain instruction and data storage units, receive instructions and data, and execute instructions. The N ² structure should contain "N" communicating ALU trees, "N" programmable root tree processor units, and an arrangement for communicating both instructions, data, and the root tree processor outputs back to the input processing elements by means of the communicating ALU trees. The apparatus can be structured as a bit-serial or word parallel system. The preferred structure contains N ² PEs, identified a s PE column,row, in a N root tree processor system, placed in the form of a N by N processor array that has been folded along the diagonal and made up of diagonal cells and general cells. The Diagonal-Cells are comprised of a single processing element identified as PE i,i of the folded N by N processor array and the General-Cells are comprised of two PEs merged together, identified as PE i,j and PE j,i of the folded N by N processor array. Matrix processing algorithms are discussed followed by a presentation of the Diagonal-Fold Tree Array Processor architecture. The Massively Parallel Diagonal-Fold Tree Array Processor supports completely connected root tree processors through the use of the array of PEs that are interconnected by folded communication ALU trees.