摘要:
A method for extracting a PE number and offset from an array index by recursive centrifuging. According to one aspect of the present invention, a processing element number is assigned to each processing element, a local memory address is assigned to each memory location and a linearized index is assigned to each array element in a multidimensional array. The processing element number of the processing element in which a particular array element is stored is computed as a function of a linearized index associated with the array element and a mask word determined from the distribution specification associated with the array. The mask word is generated from the distribution specification and applied to a linearized index associated with a particular array element to obtain processing element number bits and local offset bits. The processing element number bits and local offset bits are then accumulated to create the processing element number and local offset for the memory location associated with the array element.
摘要:
An input cell to the core logic on an electrical component and an output cell from the core logic on an electrical component are provided with a first signal path for data, a second signal path for scan data, a flip flop positioned near the pad of the core logic for selecting between said first signal path for data and second signal path for scan data. The scan data is used to input special signals or vectors to the core logic and to read the results of the scan data after it has passed through the core data and has been manipulated thereby. Several of the electrical components can be electrically connected to one another. The output cell of a first chip is electrically attached to the input cell of a second electrical component. The individual electrical components are connected on a printed circuit board and typically there are electrical conductors associated with the printed circuit board that are used to electrically connect the first chip or electrical component and the second chip or electrical component.
摘要:
Improved method and apparatus for facilitating barrier and eureka synchronization in a massively parallel processing system. The present barrier/eureka synchronization mechanism provides a partitionable, low-latency, immediately reusable, robust mechanism which can operate on a physical data-communications network and can be used to alert all processor entities (PEs) in a partition when all of the PEs in that partition have reached a designated barrier point in their individual program code, or when any one of the PEs in that partition has reached a designated eureka point in its individual program code, or when either the barrier or eureka requirements have been satisfied, which ever comes first. Multiple overlapping synchronization partitions are available simultaneously through the use of a plurality of parallel synchronization contexts. The present synchronization mechanism may be implemented on either a dedicated barrier network, or superimposed as a virtual barrier/eureka network operating on a physical data-communications network which is also used for data interchange, operating system functions, and other purposes. The present barrier/eureka mechanism also supports zero to N processor entities at each router node ("leaves" on the barrier tree), and provides a barrier sequence counter for each barrier context in order to resolve potential race conflicts that might otherwise arise.
摘要:
A vector processing system which uses vector masks to determine whether or not to perform operations on operands corresponding to bit positions within the mask is disclosed. An approximation of the number of no-operation representative bits in a vector mask register is made, and such bits are skipped to improve the performance of vector mask based operations. The number of consecutive no-op representative bits, as represented by zero values, skipped is a power of two to simplify the circuitry and logic involved in skipping such operations.
摘要:
A vector/scalar computer system has nodes interconnected by an interconnect network. Each node includes a vector execution unit, a scalar execution unit, physical vector registers, and a memory. The physical vector registers from the nodes together form an architectural vector register, which are references by vector applications. Memories from nodes together form an aggregate memory. The vector applications load memory vector elements from the memories to the physical vector registers, and store physical vector elements from the physical vector registers to the memories. The memory vector elements are interleaved among the memories of the nodes to reduce inter-node traffic during the loads and the stores.
摘要:
A system and method for implementing a serial RAID system. Data is striped for the array of disk drives and parity for the striped data is calculated and the resulting data and is written serially to a RAID system over a Fibre Channel or other type of network. The system also allows reading of the striped data and parity serially from the disk array.
摘要:
A system and method for virtual memory management. A plurality of virtual memory pages having selectable page sizes are used to tailor memory allocations in a way which balances overallocation of memory against the number of entries saved in accessing that memory through the translation buffer. A library routine can act on the overallocated memory to hide memory requests from the operating system.
摘要:
A method and apparatus for conductively cooling daughter card assemblies mounted to either an air or liquid cooled computer circuit module wherein the module has a cold plate and at least one mother board adjacent the cold plate. The module carries a number of daughter assemblies thereon adjacent the mother board. Each daughter card assembly has at least one daughter board which carries a number of electronic components on an element side of the board. A cooling side is disposed opposite the element side on the board. A thermally conductive plate has an inner side facing the mother board and an outer side opposite the inner side. The inner side has one or more projecting members extending perpendicularly towards the mother board. The daughter board is parallel to and in conductive contact with one side of the conductive plate. The module cold plate has a number of upstanding spacers projecting toward the mother board. A portion of the top of each spacer is exposed for receiving thereon one of the projecting members of the conductive plate in intimate conductive contact. Preferably, the conductive plate is sandwiched between the cooling sides of a pair of daughter boards. Heat generated by the electronic components on the daughter boards is transferred by conduction through the boards into the conductive plate. The heat conducts through the projecting members into the spacers and into the module cold plate. The heat is then carried away by the cooling medium flowing through the cold plate.
摘要:
An apparatus and method for mounting an edge connector assembly within a circuit module. Connector mounting rails are attached to the sides of a printed circuit board and the circuit board is then joined with a cold plate in order to form a circuit module. The mounting rail is an elongate strip of a substantially rigid material for attachment to the circuit board along one of its edges. The strip has an upper planar surface and inner and outer sides. The inner side is for attaching the strip to the edge of the circuit board. The outer side extends beyond the edge of the circuit board and is adapted to carry thereon a female block of the edge connector assembly. The strip also has a plurality of primary mounting openings formed in a predetermined pattern through the outer side of the strip for attaching the circuit board to a circuit module.
摘要:
An improved high performance hardwired supercomputer data processing apparatus includes instruction means adpated to issue one and two parcel instructions. Instruction fetch means provides an instruction stream of two parcel items in sequence. Instruction decode means is responsive to each two parcel item for determining in one clock cycle whether the two parcel item is a single two parcel instruction or two one parcel instructions, for issuing each two parcel instruction for execution during the one clock cycle, and for issuing one then the other of the two one parcel instructions for execution in sequence during the one clock cycle and the next succeeding clock cycle.