Abstract:
A processor-inclusive memory module (PIMM) is disclosed. In one embodiment of the present invention, the PIMM includes a printed circuit board having first and second opposing surfaces. The printed circuit board also has an address line formed therein. A first SRAM is mounted on the first surface of the printed circuit board. The present PIMM is further comprised of a second SRAM mounted on the second surface of the printed circuit board. The second SRAM is mounted on the second surface of the printed circuit board directly opposite the first SRAM mounted on the first surface of the printed circuit board. The first and second SRAMs are coupled to the address line by respective cache buses. A processor is also mounted on the first surface of the printed circuit board, and is coupled to the address line. In one embodiment of the invention, a heat sink is thermally coupled to the processor. The processor has a plurality of contact pads disposed thereon. Electrical connectors extend from the second surface of the printed circuit board. The electrical connectors are electrically coupled to respective contact pads of the processor. In the present PIMM, the electrical connectors are adapted to be removably attached to a mother board. In so doing, the present PIMM is removably attachable to a mother board.
Abstract:
An optical clock distribution method and apparatus is disclosed that minimizes clock skew in the distribution of clock signals to logic assemblies in a computer system. The logic assemblies convert the optical signals into equivalent electrical signals.
Abstract:
Utilization circuits, such as logic chip circuits, are prevented from receiving the initial one or more pulses of a train of clock pulses produced after the master system clock is started, while the pulses of that train occurring thereafter are coupled to the utilization circuit. This prevents the skew usually present between the initial pulses of the train relative to the subsequent train pulses from adversely effecting operation of the utilization circuits. This clock swallowing preferably blocks a certain predetermined number of initial clock pulses from reaching the rest of the circuitry, although the system is adaptable to allow preselection of the number of such swallowed pulses.
Abstract:
Clock pulses from a master oscillator are distributed in a multiprocessor computer system so that they arrive at a large number of utilization points located in operating clusters of modules within extremely tight time tolerances of each other. The delays associated with each component, electrical or optical connection, cable or the like are determined by direct measurement or by using known standard characteristics. A time delay budget for each complete clock pulse path from the point of initial divergence from the master clock source to the final chip delivery point is logged and summed. Components capable of introducing predetermined amounts of time delay are incorporated in some or all clock pulse paths. These components are adjusted so as to balance out the differences determined from the clock path budgets. The clock paths are implemented in electrical components either alone or in combination with optical components, or in substantially all optical configurations. One arrangement for controlling optical skew includes an arrangement of optical elements physically displaceable in a coaxial direction relative to one another. Skew adjustment networks employ a unique composition of coarse and fine selectable delay arrays implemented either by electrical components, optical components, or a combination thereof.
Abstract:
The present invention includes methods and apparatus for creating a packaging architecture for a highly parallel multiprocessor system. The packaging architecture of the present invention can provide for distribution of power, cooling and interconnections at all levels of components in a highly parallel multiprocessor system, while maximizing the number of circuits per unit time within certain operational constraints of such a multiprocessor system.
Abstract:
A unified parallel processing architecture connects together an extendible number of clusters of multiple numbers of processors to create a high performance parallel processing computer system. Multiple processors are grouped together into four or more physically separable clusters, each cluster having a common cluster shared memory that is symmetrically accessible by all of the processors in that cluster; however, only some of the clusters are adjacently interconnected. Clusters are adjacently interconnected to form a floating shared memory if certain memory access conditions relating to relative memory latency and relative data locality can create an effective shared memory parallel programming environment. A shared memory model can be used with programs that can be executed in the cluster shared memory of a single cluster, or in the floating shared memory that is defined across an extended shared memory space comprised of the cluster shared memories of any set of adjacently interconnected clusters. A distributed memory model can be used with any programs that are to be executed in the cluster shared memories of any non-adjacently interconnected clusters. The adjacent interconnection of multiple clusters of processors to a create a floating shared memory effectively combines all three type of memory models, pure shared memory, extended shared memory and distributed shared memory, into a unified parallel processing architecture.
Abstract:
A processor-inclusive memory module (PIMM) is disclosed. In one embodiment of the present invention, the PIMM includes a printed circuit board having first and second opposing surfaces. The printed circuit board also has an address line formed therein. A first SRAM is mounted on the first surface of the printed circuit board. The present PIMM is further comprised of a second SRAM mounted on the second surface of the printed circuit board. The second SRAM is mounted on the second surface of the printed circuit board directly opposite the first SRAM mounted on the first surface of the printed circuit board. The first and second SRAMs are coupled to the address line by respective cache buses. A processor is also mounted on the first surface of the printed circuit board, and is coupled to the address line. In one embodiment of the invention, a heat sink is thermally coupled to the processor. The processor has a plurality of contact pads disposed thereon. Electrical connectors extend from the second surface of the printed circuit board. The electrical connectors are electrically coupled to respective contact pads of the processor. In the present PIMM, the electrical connectors are adapted to be removably attached to a mother board. In so doing, the present PIMM is removably attachable to a mother board.
Abstract:
A unified parallel processing architecture connects together an extendible number of clusters of multiple numbers of processors to create a high performance parallel processing computer system. Multiple processors are grouped together into four or more physically separable clusters, each cluster having a common cluster shared memory that is symmetrically accessible by all of the processors in that cluster; however, only some of the clusters are adjacently interconnected. Clusters are adjacently interconnected to form a floating shared memory if certain memory access conditions relating to relative memory latency and relative data locality can create an effective shared memory parallel programming environment. A shared memory model can be used with programs that can be executed in the cluster shared memory of a single cluster, or in the floating shared memory that is defined across an extended shared memory space comprised of the cluster shared memories of any set of adjacently interconnected clusters. A distributed memory model can be used with any programs that are to be executed in the cluster shared memories of any non-adjacently interconnected clusters. The adjacent interconnection of multiple clusters of processors to a create a floating shared memory effectively combines all three type of memory models, pure shared memory, extended shared memory and distributed shared memory, into a unified parallel processing architecture.
Abstract:
A processor-inclusive memory module (PIMM) is disclosed. In one embodiment of the present invention, the PIMM includes a printed circuit board having first and second opposing surfaces. The printed circuit board also has an address line formed therein. A first SRAM is mounted on the first surface of the printed circuit board. The present PIMM is further comprised of a second SRAM mounted on the second surface of the printed circuit board. The second SRAM is mounted on the second surface of the printed circuit board directly opposite the first SRAM mounted on the first surface of the printed circuit board. The first and second SRAMs are coupled to the address line by respective cache buses. A processor is also mounted on the first surface of the printed circuit board, and is coupled to the address line. In one embodiment of the invention, a heat sink is thermally coupled to the processor. The processor has a plurality of contact pads disposed thereon. Electrical connectors extend from the second surface of the printed circuit board. The electrical connectors are electrically coupled to respective contact pads of the processor. In the present PIMM, the electrical connectors are adapted to be removably attached to a mother board. In so doing, the present PIMM is removably attachable to a mother board.
Abstract:
The present invention incorporates a variable delay circuit to add delay to a clock signal. In a preferred embodiment of the present invention, the delay is determined and fixed by a circuit employing the concept of a lock-and-leave circuit. This has the effect of fine tuning the delay determined by the lock-and-leave circuit. Mode bits allow a user to control the rate at which fine tuning occurs. Three update rates are provided in a preferred embodiment of the present invention. They are slow, medium, and fast.