Abstract:
A messaging facility is described that enables the passing of packets of data from one processing element to another in a globally addressable, distributed memory multiprocessor without having an explicit destination address in the target processing element's memory. The messaging facility can be used to accomplish a remote action by defining an opcode convention that permits one processor to send a message containing opcode, address and arguments to another. The destination processor, upon receiving the message after the arrival interrupt, can decode the opcode and perform the indicated action using the argument address and data. The messaging facility provides the primitives for the construction of an interprocessor communication protocol. Operating system communication and message-passing programming models can be accomplished using the messaging facility.
Abstract:
A barrier mechanism provides a low-latency method of synchronizing all or some of the processing elements (PEs) in a massively parallel processing system. The barrier mechanism is supported by several physical barrier synchronization circuits, each receiving an input from every PE in the processing system. Each PE has two associated barrier synchronization registers, in which each bit is used as an input to one of several logical barrier synchronization circuits. The hardware supports both a conventional barrier function and an alternative eureka function. Each bit in each of the barrier synchronization registers can be programmed to perform as either barrier or eureka function, and all bits of the registers and each barrier synchronization circuit functions independently. Partitioning among PEs is accomplished by a barrier mask and interrupt register which enables certain of the bits in the barrier synchronization registers to a defined group of PEs. Further partitioning is accomplished by providing bypass points in the physical barrier synchronization circuits to subdivide the physical barrier synchronization circuits into several types of PE barrier partitions of varying size and shape. The barrier mask and interrupt register and the bypass points are used in concert to accomplish flexible and scalable partitions corresponding to user-desired sizes and shapes with a latency several orders of magnitude faster than existing software implementations.
Abstract:
A digital optical serial communication system and encoding method comprises a transmitter responsive to an input of parallel information for parsing the information into 4-bit groups. The 4-bit groups are encoded into 5-bit codes having a 40/60 duty cycle and wherein no more than two consecutive bits are logical 1's or 0's on either end of the 5-bit code. The 5-bit codes are serially transmitted by an optical transmission medium for providing a conduit from the transmitter to a receiver. The receiver receives and decodes the serial information to 4-bit groups. The 4-bit groups are concatenated to form a parallel packet of information suitable for data processing. The encoding/decoding scheme has the advantages of (1) a worst case duty factor of 40/60%; (2) a maximum run of bits without transition equal to five; (3) an easily recaptured framing of packets due to a unique sync symbol; and (4) simple encoding and decoding of packets using combinational logic rather than lookup tables. In addition, data can be continuously sent via a communications protocol.
Abstract:
Processing transaction requests in a shared memory multi-processor computer network is described. A transaction request is received at a servicing agent from a requesting agent. The transaction request includes a request priority associated with a transaction urgency generated by the requesting agent. The servicing agent provides an assigned priority to the transaction request based on the request priority, and then compares the assigned priority to an existing service level at the servicing agent to determine whether to complete or reject the transaction request. A reply message from the servicing agent to the requesting agent is generated to indicate whether the transaction request was completed or rejected, and to provide reply fairness state data for rejected transaction requests.
Abstract:
A method and apparatus for accessing memory-mapped registers that are distributed across a large integrated circuit. Some embodiments provide a method for accessing memory-mapped registers that are distributed across a first integrated circuit, the first integrated circuit including a plurality of logic subset modules, wherein each of the plurality of logic subset modules includes one or more memory-mapped registers. This method includes receiving a memory-mapped register access request into the first integrated circuit, serially transmitting, through each of the plurality of logic subset modules, a first plurality of data packets based on the memory-mapped register access request, wherein the first plurality of data packets includes an address specification for a memory-mapped register associated with a first one of the logic subset modules, and within the first logic subset module, accessing the memory-mapped register associated with the first logic subset module. Another aspect of the present invention provides an MMR circuit for accessing memory-mapped registers that are distributed across a first integrated circuit chip, the first integrated circuit chip including a plurality of logic subset modules.
Abstract:
Address translation means for distributed memory massively parallel processing (MPP) systems include means for defining virtual addresses for processing elements (PE's) and memory relative to a partition of PE's under program control, means for defining logical addresses for PE's and memory within a three-dimensional interconnected network of PE's in the MPP, and physical addresses for PE's and memory corresponding to identities and locations of PE modules within computer cabinetry. As physical PE's are mapped into or out of the logical MPP, as spares are needed, logical addresses are updated. Address references generated by a PE within a partition in virtual address mode are converted to logical addresses and physical addresses for routing on the network.
Abstract:
A system and address method for extracting a PE number and offset from an array index. According to one aspect of the present invention, a processing element number is assigned to each processing element, a local memory address is assigned to each memory location and a linearized index is assigned to each array element in an array. The processing element number of the processing element in which a particular array element is stored is computed as a function of a linearized index associated with the array element and a distribution specification associated with the array. In addition, a local memory address associated with the array element is computed as a function of the linearized index and the distribution specification.
Abstract:
A digital optical serial communication system and encoding method comprises a transmitter responsive to an input of parallel information for parsing the information into 4-bit groups. The 4-bit groups are encoded into 5-bit codes having a 40/60 duty cycle and wherein no more than two consecutive bits are logical 1's or 0's on either end of the 5-bit code. The 5-bit codes are serially transmitted by an optical transmission medium for providing a conduit from the transmitter to a receiver. The receiver receives and decodes the serial information to 4-bit groups. The 4-bit groups are concatenated to form a parallel packet of information suitable for data processing. The encoding/decoding scheme has the advantages of (1) a worst case duty factor of 40/60%; (2) a maximum run of bits without transition equal to five; (3) an easily recaptured framing of packets due to a unique sync symbol; and (4) simple encoding and decoding of packets using combinational logic rather than lookup tables. In addition, data can be continuously sent via a communications protocol.
Abstract:
An improved solid state storage device (SSD) with memory organized into a plurality of groups, each group including a plurality of ranks, and each rank having at least two banks sharing a bidirectional data bus. A matrix reorder circuit is used to distribute data across individual memory components in a way that prevents multibit uncorrectable or undetectable errors due to the failure of a single memory component. The matrix reorder circuit is used for both reading and writing data, and operates on a stream of pipelined data of arbitrary length.According to another aspect of this invention, a flaw map and additional hot spare memory are used to electrically replace failing memory components in theAccording to another aspect of this invention, memory in a bank is accessed during one half of a reference cycle and refreshed during the second half of the reference cycle, each bank being 180 degrees out of phase with the other so that a read or write is performed on one bank while a memory refresh is performed on the other bank.
Abstract:
A nibble-mode DRAM solid state storage device is organized into a plurality of sections each including a plurality of groups, each including a plurality of ranks of DRAM memory chips. A pipeline data path is provided into and out of each group and nibble-mode access is facilitated by simultaneous pipelining of data into and out of the memory while memory reference operations are accomplished.