摘要:
A parallel processor has a plurality of communication buses advantageously interconnecting the arithmetic processor elements, the memory controller elements, a global controller circuitry, and input/output processors. The processor preferably has at least one central processing unit cluster, the cluster having at least one integer processor and one floating point processor. A plurality of I/F buses interconnect the integer and floating point processors of a cluster for communications therebetween. Integer load buses connect the integer processors of each cluster and selectively connect those processors to the memory controllers for transferring data from memory to the clusters and for providing inter-integer processor data communications. A plurality of floating point load buses connect the floating point processors of the clusters to selected memory controllers for transferring data from the controllers to the floating point processors and for providing inter-floating point processor data communications. A plurality of physical address buses provide one-way communications for transferring memory addresses from the integer processors to the memories and a plurality of storage buses connect the floating point processors to the memory controllers along a one-way communications path for transferring data to be stored in the memories. The hardware architecture provides advantageous communications between the elements of the data processing system for enabling wide bandwidth communications and high instruction throughput.
摘要:
A method and apparatus for storing an instruction word in a compacted form on a storage media, the instruction word having a plurality of instruction fields, features associating with each instruction word, a mask word having a length in bits at least equal to the number of instruction fields in the instruction word. Each instruction field is associated with a bit of the mask word and accordingly, using the mask word, only non-zero instruction fields need to be stored in memory. The instruction compaction method is advantageously used in a high speed cache miss engine for refilling portions of instruction cache after a cache miss occurs.
摘要:
A method and apparatus for storing an instruction word in a compacted form on a storage media, the instruction word having a plurality of instruction fields, features associating with each instruction word, a mask word having a length in bits at least equal to the number of instruction fields in the instruction word. Each instruction field is associated with a bit of the mask word and accordingly, using the mask word, only non-zero instruction fields need to be stored in memory. The instruction compaction method is advantageously used in a high speed cache miss engine for refilling portions of instruction cache after a cache miss occurs.
摘要:
A data processor has a central processing unit and at least one pipelined memory controller circuitry. The central processing unit addresses data in the memory using a virtual address memory table lookaside buffer and features a data miss recovery circuitry wherein, after a memory access error condition has been detected, the instruction causing the error condition, and those instructions entering the memory pipeline after the instruction causing the error condition, are replayed. The method and apparatus for replaying the instructions use first in-first out buffers for storing the virtual address data and instruction status data relating to each memory access instruction. That stored data is then retrieved after an error condition is detected so that the instruction sequence, beginning at the data miss, can be replayed.
摘要:
In a parallel data processing system having a plurality of separately operating arithmetic processing units, a method and apparatus allows a plurality of branch instructions to be operated upon in a single machine cycle. The branch instructions have associated therewith a hierarchical priority system and the method and apparatus determine which branch, if any, should be taken. In particular, the method and apparatus simultaneously determine, during the parallel execution of the branch instructions, whether any branch test condition associated with a branch instruction is true, and independently, the target address for each branch instruction and a fall-through instruction address if a branch instruction is not taken.
摘要:
A multiprocessor data processing system in which a number of independent processors can concurrently operate on a shared memory even when one processor is performing a read-modify-write (RMW) operation, the system having a locking, content-associative write buffer and a controller for identifying RMW requests, for addressing the buffer and, for issuing directives to lock the buffer, to validate particular data blocks in the buffer and to transfer data back and forth between the processors, the memory and the buffer.
摘要:
A data processing system, wherein the central processing unit has an arithmetic element for processing data in response to machine program instructions and a control store for microcode program storage responsive to the machine instructions for implementing the instruction, has an improved arithmetic unit for enabling higher throughput without substantially increasing hardware cost. The arithmetic unit has a reconfigurable arithmetic logic unit which is controlled in response to both hardware generated data signals and microcode generated data signals. A data string manipulation circuitry provides for aligning data strings for processing by the arithmetic logic unit. Circuitry is provided, responsive to a decoded machine instruction, for generating control signals for configuring the arithmetic unit and for controlling the data string manipulation circuitry. As a result, the number of microcode steps needed to implement particular decimal and string manipulation machine instructions is significantly reduced, thereby saving machine cycles, while the additional hardware cost is very modest.
摘要:
Systems, methods, and devices for MIMO communications with reduced compute complexity are disclosed. Spectrally whitened communications are received, magnitude distortion is removed and phase distortion is corrected. The magnitude distortion is removed separately from the correction of the phase distortion, thereby reducing the compute complexity.
摘要:
A system and procedure for placement optimization of input/output ports associated with edges of circuit blocks within an integrated circuit design. The integrated circuit design is composed of circuit blocks that communicate using inter-block signal wires coupled to input/output ports (IOPs) located along edges of circuit blocks. An arbitrary IOP placement is first received, e.g., from a global floorplanner, and indicates (1) the allowable edge placement domains for each IOP and can optionally include (2) an arbitrary IOP placement within these allowable edge domains. A cell placer (e.g., a quadratic based standard cell placer) receives the arbitrary IOP placement and, for each circuit block, places cells represented within internal netlists. The placer does not optimize the placement of the IOPs. For each IOP, the set of cells of the net that is coupled to the IOP is determined. Each IOP is then moved, within its allowable edge placement, to a position closest to the nearest cell that is within its associated net. The above sequence is then repeated a number of times (e.g., IOPs are moved and the placer is run again); upon each run the routability of the placement is estimated. After the above iterations, the present invention accepts the placement with the best estimated routability and this placement is then routed by a router. By taking into account the position of cells associated with an IOP, and displacing the IOP near these cells, the internal circuit is more efficiently placed which reduces the size of the circuit block up to 30 percent.
摘要:
A cache coherence system for a multiprocessor system including a plurality of data processors coupled to a common main memory. Each of the data processors includes an associated cache memory having storage locations therein corresponding to storage locations in the main memory. The cache coherence system for a data processor includes a cache invalidate table (CIT) memory having internal storage locations corresponding to locations in the cache memory of the data processor. The cache coherence system detects when the contents of storage locations in the cache memories of the one or more of the data processors have been modified in conjuction with the activity those data processors and is responsive to such detections to generate and store in its CIT memory a multiple element linked list defining the locations in the cache memories of the data processors having modified contents. Each element of the list defines one of those cache storage locations and also identifies the location in the CIT memory of the next element in the list.