摘要:
A system and method of accessing a memory location within a system having a processor and a plurality of memory locations separate from the processor. The system includes a plurality of external registers which are connected to the processor over a data bus, address translation means, connected to the processor over the data bus and an address bus, for calculating, based on an index written to the data bus, an address associated with one of the memory locations, and transfer means, connected to the plurality of external registers, for transferring data between the addressed memory location and one of the external registers.
摘要:
A system and method of accessing a memory location within a system having a processor and a plurality of memory locations separate from the processor. The system includes a plurality of external registers which are connected to the processor over a data bus, address translation means, connected to the processor over the data bus and an address bus, for calculating, based on an index written to the data bus, an address associated with one of the memory locations, and transfer means, connected to the plurality of external registers, for transferring data between the addressed memory location and one of the external registers.
摘要:
A messaging facility is described that enables the passing of packets of data from one processing element to another in a globally addressable, distributed memory multiprocessor without having an explicit destination address in the target processing element's memory. The messaging facility can be used to accomplish a remote action by defining an opcode convention that permits one processor to send a message containing opcode, address and arguments to another. The destination processor, upon receiving the message after the arrival interrupt, can decode the opcode and perform the indicated action using the argument address and data. The messaging facility provides the primitives for the construction of an interprocessor communication protocol. Operating system communication and message-passing programming models can be accomplished using the messaging facility.
摘要:
A barrier mechanism provides a low-latency method of synchronizing all or some of the processing elements (PEs) in a massively parallel processing system. The barrier mechanism is supported by several physical barrier synchronization circuits, each receiving an input from every PE in the processing system. Each PE has two associated barrier synchronization registers, in which each bit is used as an input to one of several logical barrier synchronization circuits. The hardware supports both a conventional barrier function and an alternative eureka function. Each bit in each of the barrier synchronization registers can be programmed to perform as either barrier or eureka function, and all bits of the registers and each barrier synchronization circuit functions independently. Partitioning among PEs is accomplished by a barrier mask and interrupt register which enables certain of the bits in the barrier synchronization registers to a defined group of PEs. Further partitioning is accomplished by providing bypass points in the physical barrier synchronization circuits to subdivide the physical barrier synchronization circuits into several types of PE barrier partitions of varying size and shape. The barrier mask and interrupt register and the bypass points are used in concert to accomplish flexible and scalable partitions corresponding to user-desired sizes and shapes with a latency several orders of magnitude faster than existing software implementations.
摘要:
Address translation means for distributed memory massively parallel processing (MPP) systems include means for defining virtual addresses for processing elements (PE's) and memory relative to a partition of PE's under program control, means for defining logical addresses for PE's and memory within a three-dimensional interconnected network of PE's in the MPP, and physical addresses for PE's and memory corresponding to identities and locations of PE modules within computer cabinetry. As physical PE's are mapped into or out of the logical MPP, as spares are needed, logical addresses are updated. Address references generated by a PE within a partition in virtual address mode are converted to logical addresses and physical addresses for routing on the network.
摘要:
A system and address method for extracting a PE number and offset from an array index. According to one aspect of the present invention, a processing element number is assigned to each processing element, a local memory address is assigned to each memory location and a linearized index is assigned to each array element in an array. The processing element number of the processing element in which a particular array element is stored is computed as a function of a linearized index associated with the array element and a distribution specification associated with the array. In addition, a local memory address associated with the array element is computed as a function of the linearized index and the distribution specification.
摘要:
A method of performing remote address translation in a multiprocessor system includes determining a connection descriptor and a virtual address at a local node, accessing a local connection table at the local node using the connection descriptor to produce a system node identifier for a remote node and a remote address space number, communicating the virtual address and remote address space number to the remote node, and translating the virtual address to a physical address at the remote node (qualified by the remote address space number). A user process running at the local node provides the connection descriptor and virtual address. The translation is performed by matching the virtual address and remote address space number with an entry of a translation-lookaside buffer (TLB) at the remote node. Performing the translation at the remote node reduces the amount of translation information needed at the local node for remote memory accesses. The method supports communication within a scalable multiprocessor, and across the machine boundaries in a cluster.
摘要:
A method and apparatus for deallocating memory in a multi-processor, shared memory system. In one aspect, a node in the system has a node controller that contains sequencing logic. The sequencing logic receives a command across a network. The sequencing logic translates the received command into a Purge Translation Cache (PTC) instruction and sends the PTC instruction across a bus to a processor. The processor contains bus control logic that receives the PTC instruction and purges a virtual address specified in the PTC instruction from the processor's translation lookaside buffer. By purging the virtual address, the memory is deallocated.
摘要:
A method for extracting a PE number and offset from an array index by recursive centrifuging. According to one aspect of the present invention, a processing element number is assigned to each processing element, a local memory address is assigned to each memory location and a linearized index is assigned to each array element in a multidimensional array. The processing element number of the processing element in which a particular array element is stored is computed as a function of a linearized index associated with the array element and a mask word determined from the distribution specification associated with the array. The mask word is generated from the distribution specification and applied to a linearized index associated with a particular array element to obtain processing element number bits and local offset bits. The processing element number bits and local offset bits are then accumulated to create the processing element number and local offset for the memory location associated with the array element.
摘要:
Processing transaction requests in a shared memory multi-processor computer network is described. A transaction request is received at a servicing agent from a requesting agent. The transaction request includes a request priority associated with a transaction urgency generated by the requesting agent. The servicing agent provides an assigned priority to the transaction request based on the request priority, and then compares the assigned priority to an existing service level at the servicing agent to determine whether to complete or reject the transaction request. A reply message from the servicing agent to the requesting agent is generated to indicate whether the transaction request was completed or rejected, and to provide reply fairness state data for rejected transaction requests.