摘要:
A global register system provides communication and coordination among a plurality of processors sharing a common memory in a multiprocessor system which access one or more registers within a shared resource circuit that is separate from the common memory and is symmetrically accessible by the plurality of processors in the multiprocessor system. The global register system is accessed by direct addresses determined by the processor from a previously assigned indirect address and an instruction accessing the data stored in global registers. Arithmetic or logic operation on a data value stored in a selected one of the registers are performed by the global register system independent from the processors or the common memory in order to modify the data value in the selected global register as part of an atomic operation performed in response to a single read-and-modify instruction received from one of the processors.
摘要:
A method and apparatus for non-sequential access to shared resources in a multiple requestor system uses a variety of tags to effectively re-order the data at its destination. In simplest form, the tag directs switching logic to where in a buffer to locate another tag for direction information or where in a buffer or processor (register) to put the response associated with the tag. For example, loading data from memory requires that the requestor provide a request signal, an address, and a request tag. The request signal validates the address and request tag. The address specifies the location of the requested data in memory. The request tag specifies where to put the data when it is returned to the processor. The switching logic for the requestor includes a tag queue for storing the request tags associated with the resource requests, logic means for associating the respective request tag from the tag queue with a resource response, and means for returning the resource response and respective request tag to the requestor. The switching logic associated with the shared resource includes switching means to route the request into and out of the shared resource, control logic to correctly route the request, logic to handle multiple decision requests, and logic to store or retrieve the ultimate data entity being requested.
摘要:
A scalar/vector processor capable of concurrent scaler and vector operations includes scalar resources to process scalar instructions, and vector resources adapted to be operated concurrently with the scalar resources and with one another to process vector instructions. The scalar resources include scalar registers, and the vector resources include vector registers. Decoding means decodes each of a number of address fields. Each field represents a register address to access alternatively one of the scalar registers or one of the vector registers depending on a value of the register address being above or below a selected moveable address value within a range of addresses encompassed by the address field.
摘要:
A vector processing system includes a main memory, vector registers, vector resources for accessing memory to transfer vector data between main memory and the vector registers and to perform operations on the vector data. Data words stored in non-consecutive address locations of a segment of main memory are accessed for processing. Offset address values of a number of the data words are stored in consecutive elements of a first vector register. A vector gather instruction is executed which adds each offset address value to a base address value to calculate main memory addresses representing main memory storage locations of the data words, retrieves the data words from the main memory, and stores the data words in consecutive elements of a second vector register in an order corresponding to that in which the offset address values are stored in the first vector register. A second vector instruction is chained to the gather instruction for performing an operation upon the retrieved data words and storing the results in a third vector register. A vector scatter instruction is chained to the second vector instruction to return the results to the main memory.
摘要:
A vector processor includes functional unit paths, each having an input and an output, and with at least one functional unit path including a plurality of pipelined functional elements coupled to the respective path input and output in parallel. The functional elements have different pipeline lengths to complete processing of operands applied to the path input. Program instruction initiation means responds to a first instruction to initiate processing of first operand data in a first of the functional elements, and responds to a second instruction to initiate the processing of second operand data in a second of the functional elements dependent upon completion of the first instruction but only if the second functional element has a pipeline length equal to or greater than the pipeline length of the first functional element.
摘要:
Global registers for a multiprocessor system support multiple parallel access paths for simultaneous operations on separate sets of global registers, each set of global registers referred to as a global register file. An arbitration mechanism associated with the global registers is used for resolving multiple, simultaneous requests to a single global register file. An arithmetic and logical unit (ALU) is also associated with each global register file for allowing atomic arithmetic operations to be performed on the entire register value for any of the global registers in that global register file.
摘要:
The present invention is an improved high performance scalar/vector processor. In the preferred embodiment, the scalar/vector processor is used in a multiprocessor system. The scalar/vector processor is comprised of a scalar processor for operating on scalar and logical instructions, including a plurality of independent functional units operably connected to the scalar processor, a vector processor for operating on vector instructions, including a plurality of independent functional units operably connected to the vector processor, and an instruction control mechanism for fetching both the scalar and vector instructions from an instruction cache and controlling the operation of those instructions in both the scalar and vector processor. The instruction control mechanism is designed to enhance the performance of the scalar/vector processor by keeping a multiplicity of pipelines substantially filled with a minimum number of gaps.
摘要:
A fast interrupt mechanism is capable of simultaneously interrupting a community of associated processors in a multiprocessor system. The fast interrupt mechanism enables the more effective debugging of software executing on a multiprocessor system by allowing all of the processors in a community associated with a parallel process to be halted within a limited number of clock cycles following a hardware exception or processor breakpoint. The fast interrupt mechanism consists of a set of registers that are used to identify associations among multiple processors, a comparison matrix that is used to select processors to be interrupted, a network of interconnections that transmit interrupt events to and from the processors, and elements in the processors that create and respond to fast interrupt events.
摘要:
A delayed branch mechanism maintains the flow of an instruction pipeline in a scalar/vector processor having an instruction cache and including instruction fetch means, a program counter, and instruction decode/issue means coupled to the instruction cache by means of the instruction pipeline. Conditional branch instructions are rated as likely conditional branch instructions or unlikely conditional branch instructions based on a probability that their branch conditions will be met. A number of pipeline clock periods required for testing the branch conditions are determined. The likely conditional branch instructions are issued and executed including transferring a branch-to-address to the program counter during the number of pipeline clock periods irrespective of a successful meeting of the branch conditions. A number of useful instructions sufficient to issue within the number of pipeline clock periods are placed into the instruction stream following the likely conditional branch instructions. A conditional branch instruction is canceled and returned to an instruction which would have followed the conditional branch instruction if the branch is not taken. No gap occurs in the instruction stream if the corresponding branch is successfully taken.
摘要:
A sequence of conditional vector IF statements is processed by employing a mask register and a condition register. Each conditional vector IF statement is typically performed on two vector registers, each having vector elements. A first conditional vector IF statement in the sequence is processed for those vector elements corresponding to set bits in the mask register. Bits are set in the condition register to reflect those vector elements which correspond to the set bits in the mask register for which the conditional vector IF statement is satisfied. The contents of the condition register are then moved into the mask register. A next conditional vector IF statement in the sequence is then processed for those vector elements corresponding to the new set bits in the mask register. Bits are then set in the condition register to reflect those vector elements which correspond to the new set bits in the mask register for which the conditional vector IF statement is satisfied.