摘要:
A hyperprocessor includes a control processor controlling tasks executed by a plurality of processor cores, each of which may include multiple execution units, or special hardware units. The control processor schedules tasks according to control threads for the tasks created during compilation and comprising a hardware context including register files, a program counter and status bits for the respective task. The tasks are dispatched to the processor cores or special hardware units for parallel, sequential, out-of-order or speculative execution. A universal register file contains data to be operated on by the task, and an interconnect couples at least the processor cores or special hardware units to each other and to the universal register file, allowing each node to communicate with any other node.
摘要:
A method and a computing system compute an incremental checksum corresponding to a data packet. The incremental checksum is computed within one processor cycle of a processor. A first register (102) stores first checksum information corresponding to a data packet. A second register (104) stores second checksum information corresponding to old information being deleted from the data packet. A third register (106) stores third checksum information corresponding to new information being added to the data packet. An incremental checksum circuit (100), electrically connected to the first register (102), to the second register (104), and to the third register (106), provides resulting checksum information corresponding to the data packet after deleting the old information from the data packet and adding the new information to the data packet. The resulting checksum information is selectively stored in the first register (102).
摘要:
A system for detecting multiple errors that may occur during transfer of data and for correcting up to two of these errors simultaneously. The system has a component for calculating a number of check bits associated with the data word. Also provided is a component for grouping all data bits into base groups and multiple groups, the sum of the number of base groups and multiple groups being equal to the number of check bits. Up to two weights are assigned for each data bit. The system distributes the data bits among the groups according to the weights assigned thereto. Also provided is a component for generating a check bit for each of the groups and for padding the data word with the check bits to form an appended data word. A generator creates a predetermined number of syndrome bits, the number being the number of check bits. Finally, a decoder is provided for decoding the syndrome bits to identify the erroneous bits in the data word.
摘要:
An apparatus to isolate a main memory in a multiprocessor computer is provided. The apparatus include a master processor and a management device communicating with the master processor. One or more slave processors communicate with the master processor and the management device. A volatile memory also communicates with the management device and the main memory communicating with the volatile memory. This Abstract is provided for the sole purpose of complying with the Abstract requirement rules that allow a reader to quickly ascertain the subject matter of the disclosure contained herein. This Abstract is submitted with the explicit understanding that it will not be used to interpret or to limit the scope or the meaning of the claims.
摘要:
A circular buffer storing packets for processing by one or more network processors employs an empty buffer address register identifying where a next received packet should be stored, a next packet address register identifying the next packet to be processed, and a packet-processing address register within each network processor identifying the packet being processed by that network processor. The n-bit addresses to the buffer are mapped or masked from/to the m-bit packet-processing address registers by software, allowing the buffer size to be fully scalable. A dedicated packet retrieval instruction supported by the network processor(s) retrieves a new packet for processing using the next packet address register and copies that into the associated packet-processing address register for use in subsequent accesses. Buffer management is thus independent of the network processor architecture.
摘要:
In lieu of branch prediction, a merged fetch-branch unit operates in parallel with the decode unit within a processor. Upon detection of a branch instruction within a group of one or more fetched instructions, any instructions preceding the branch are marked regular instructions, the branch instruction is marked as such, and any instructions following branch are marked sequential instructions. Within two cycles, sequential instructions following the last fetched instruction are retrieved and marked, target instructions beginning at the branch target address are retrieved and marked, and the branch is resolved. Either the sequential or target instructions are then dropped depending on the branch resolution, incurring a fixed, 1 cycle branch penalty.
摘要:
A method and apparatus for performing normalization of floating point numbers using a much smaller width register than would normally be required for the data operands which can be processed. As the registers are smaller, the number of circuits required to achieve the normalization is reduced, resulting in a decrease in the chip area required to perform such operation. The normalization circuitry was streamlined to efficiently operate on the more prevalent type of data being presented to the floating point unit. Data types and/or operations which statistically occur less frequently require multiple cycles of the normalization function. It was found that for the more prevalent data types and/or operations, the width of the registers required was substantially less than the width required for the less frequent data types and/or operations. Instead of expanding the register width to accommodate these lesser occurrences, the data is broken into smaller portions and normalized using successive cycles of the normalization circuitry. Thus, by sacrificing speed for the lesser occurring events, a significant savings was realized in the number of circuits required to implement normalization. As the slower speed operations occur infrequently, the overall performance of the normalization function is minimally impacted. Thus, considerable savings in integrated circuit real estate is achieved with minimal impact to the overall throughput of the system.
摘要:
An octagonal interconnection network for routing data packets. The interconnection network comprises: 1) eight switching circuits for transferring data packets with each other; 2) eight sequential data links bidirectionally coupling the eight switching circuits in sequence to thereby form an octagonal ring configuration; and 3) four crossing data links, wherein a first crossing data link bidirectionally couples a first switching circuit to a fifth switching circuit, a second crossing data link bidirectionally couples a second switching circuit to a sixth switching circuit, a third crossing data link bidirectionally couples a third switching circuit to a seventh switching circuit, and a fourth crossing data link bidirectionally couples a fourth switching circuit to an eighth switching circuit.
摘要:
A method and a bit counting device (100) count bits set to one in a data word of arbitrary size. The bit counting device (100) includes a first data register (110) for storing a data word, an offset register (112) for storing an offset value, a second data register (120), and a one-cycle shifter (114), electrically connected to the first data register (110), to the second data register (120), and to the offset register (112), for shifting the data word by a value stored in the offset register (112) and storing the shifted data word in the second data register (120). The device 100 also includes a third data register (124) and at least one carry save adder (CSA) device (122) organized in a tree structure, and electrically connected to the second data register (120) and to the third data register (124), for counting the number of bits set to one in the data word stored in the second data register (120) and storing in the third data register (124) a value representing the count of bits set to one in the data word.
摘要:
An architecture and method relating to a floating point operation which performs the mathematical computation of A*B+C. The multiplication is accomplished in two or more stages, each stage involving corresponding sets of partial products and concurrently accomplished incremental summations. A pipelined architecture provides for the summation of the least significant bits of an intermediate product with operand C at a stage preceding entry into a full adder. Thereby, a significant portion of the full adder can be replaced by a simpler and smaller incrementer circuit. Partitioning of the multiplication operation into two or more partial product operations proportionally reduces the size of the multiplier required. Pipelining and concurrence execution of multiplication and addition operation in the multiplier provides in two cycles the results of the mathematical operation A*B+C while using a full adder of three-quarters normal size.