摘要:
A method and apparatus for non-sequential access to shared resources in a multiple requestor system uses a variety of tags to effectively re-order the data at its destination. In simplest form, the tag directs switching logic to where in a buffer to locate another tag for direction information or where in a buffer or processor (register) to put the response associated with the tag. For example, loading data from memory requires that the requestor provide a request signal, an address, and a request tag. The request signal validates the address and request tag. The address specifies the location of the requested data in memory. The request tag specifies where to put the data when it is returned to the processor. The switching logic for the requestor includes a tag queue for storing the request tags associated with the resource requests, logic means for associating the respective request tag from the tag queue with a resource response, and means for returning the resource response and respective request tag to the requestor. The switching logic associated with the shared resource includes switching means to route the request into and out of the shared resource, control logic to correctly route the request, logic to handle multiple decision requests, and logic to store or retrieve the ultimate data entity being requested.
摘要:
Global registers for a multiprocessor system support multiple parallel access paths for simultaneous operations on separate sets of global registers, each set of global registers referred to as a global register file. An arbitration mechanism associated with the global registers is used for resolving multiple, simultaneous requests to a single global register file. An arithmetic and logical unit (ALU) is also associated with each global register file for allowing atomic arithmetic operations to be performed on the entire register value for any of the global registers in that global register file.
摘要:
A global register system provides communication and coordination among a plurality of processors sharing a common memory in a multiprocessor system which access one or more registers within a shared resource circuit that is separate from the common memory and is symmetrically accessible by the plurality of processors in the multiprocessor system. The global register system is accessed by direct addresses determined by the processor from a previously assigned indirect address and an instruction accessing the data stored in global registers. Arithmetic or logic operation on a data value stored in a selected one of the registers are performed by the global register system independent from the processors or the common memory in order to modify the data value in the selected global register as part of an atomic operation performed in response to a single read-and-modify instruction received from one of the processors.
摘要:
A cluster architecture for a highly parallel multiprocessor computer processing system is comprised of one or more clusters of tightly-coupled, high-speed processors capable of both vector and scalar parallel processing that can symmetrically access shared resources associated with the cluster, as well as the shared resources associated with other clusters.
摘要:
A method of accessing common memory in a cluster architecture for a highly parallel multiprocessor scaler/factor computer system using a plurality of segment registers in which is first determined whether a logical address is within a start and end range as defined by the segment registers and then relocating the logical address to a physical address using a displacement value in another segment register.
摘要:
The present invention is an improved high performance scalar/vector processor. In the preferred embodiment, the scalar/vector processor is used in a multiprocessor system. The scalar/vector processor is comprised of a scalar processor for operating on scalar and logical instructions, including a plurality of independent functional units operably connected to the scalar processor, a vector processor for operating on vector instructions, including a plurality of independent functional units operably connected to the vector processor, and an instruction control mechanism for fetching both the scalar and vector instructions from an instruction cache and controlling the operation of those instructions in both the scalar and vector processor. The instruction control mechanism is designed to enhance the performance of the scalar/vector processor by keeping a multiplicity of pipelines substantially filled with a minimum number of gaps.
摘要:
A delayed branch mechanism maintains the flow of an instruction pipeline in a scalar/vector processor having an instruction cache and including instruction fetch means, a program counter, and instruction decode/issue means coupled to the instruction cache by means of the instruction pipeline. Conditional branch instructions are rated as likely conditional branch instructions or unlikely conditional branch instructions based on a probability that their branch conditions will be met. A number of pipeline clock periods required for testing the branch conditions are determined. The likely conditional branch instructions are issued and executed including transferring a branch-to-address to the program counter during the number of pipeline clock periods irrespective of a successful meeting of the branch conditions. A number of useful instructions sufficient to issue within the number of pipeline clock periods are placed into the instruction stream following the likely conditional branch instructions. A conditional branch instruction is canceled and returned to an instruction which would have followed the conditional branch instruction if the branch is not taken. No gap occurs in the instruction stream if the corresponding branch is successfully taken.
摘要:
A sequence of conditional vector IF statements is processed by employing a mask register and a condition register. Each conditional vector IF statement is typically performed on two vector registers, each having vector elements. A first conditional vector IF statement in the sequence is processed for those vector elements corresponding to set bits in the mask register. Bits are set in the condition register to reflect those vector elements which correspond to the set bits in the mask register for which the conditional vector IF statement is satisfied. The contents of the condition register are then moved into the mask register. A next conditional vector IF statement in the sequence is then processed for those vector elements corresponding to the new set bits in the mask register. Bits are then set in the condition register to reflect those vector elements which correspond to the new set bits in the mask register for which the conditional vector IF statement is satisfied.
摘要:
A scalar/vector processor capable of concurrent scaler and vector operations includes scalar resources to process scalar instructions, and vector resources adapted to be operated concurrently with the scalar resources and with one another to process vector instructions. The scalar resources include scalar registers, and the vector resources include vector registers. Decoding means decodes each of a number of address fields. Each field represents a register address to access alternatively one of the scalar registers or one of the vector registers depending on a value of the register address being above or below a selected moveable address value within a range of addresses encompassed by the address field.
摘要:
A vector processing system includes a main memory, vector registers, vector resources for accessing memory to transfer vector data between main memory and the vector registers and to perform operations on the vector data. Data words stored in non-consecutive address locations of a segment of main memory are accessed for processing. Offset address values of a number of the data words are stored in consecutive elements of a first vector register. A vector gather instruction is executed which adds each offset address value to a base address value to calculate main memory addresses representing main memory storage locations of the data words, retrieves the data words from the main memory, and stores the data words in consecutive elements of a second vector register in an order corresponding to that in which the offset address values are stored in the first vector register. A second vector instruction is chained to the gather instruction for performing an operation upon the retrieved data words and storing the results in a third vector register. A vector scatter instruction is chained to the second vector instruction to return the results to the main memory.