Abstract:
A Multi-Petascale Highly Efficient Parallel Supercomputer of 100 petaOPS-scale computing, at decreased cost, power and footprint, and that allows for a maximum packaging density of processing nodes from an interconnect point of view. The Supercomputer exploits technological advances in VLSI that enables a computing model where many processors can be integrated into a single Application Specific Integrated Circuit (ASIC). Each ASIC computing node comprises a system-on-chip ASIC utilizing four or more processors integrated into one die, with each having full access to all system resources and enabling adaptive partitioning of the processors to functions such as compute or messaging I/O on an application by application basis, and preferably, enable adaptive partitioning of functions in accordance with various algorithmic phases within an application, or if I/O or other processors are underutilized, then can participate in computation or communication nodes are interconnected by a five dimensional torus network with DMA that optimally maximize the throughput of packet communications between nodes and minimize latency.
Abstract:
A translation table has entries that each include a share bit and a delta bit, with pointers that point to a memory block that includes reuse bits. When two translation table entries reference identical fragments in a memory block, one of the translation table entries is changed to refer to the same memory block referenced in the other translation table entry, which frees up a memory block. The share bit is set to indicate a translation table entry is sharing its memory block with another translation table entry. In addition, a translation table entry may include a private delta in the form of a pointer that references a memory fragment in the memory block that is not shared with other translation table entries. When a translation table has a private delta, its delta bit is set.
Abstract:
A method and circuit for implementing a cache directory and efficient cache tag lookup in very large cache systems, and a design structure on which the subject circuit resides are provided. A tag cache includes a fast partial large (LX) cache directory maintained separately on chip apart from a main LX cache directory (LXDIR) stored off chip in dynamic random access memory (DRAM) with large cache data (LXDATA). The tag cache stores most frequently accessed LXDIR tags. The tag cache contains predefined information enabling access to LXDATA directly on tag cache hit with matching address and data present in the LX cache. Only on tag cache misses the LXDIR is accessed to reach LXDATA.
Abstract:
An apparatus and method for providing a data eye monitor. The data eye monitor apparatus utilizes an inverter/latch string circuit and a set of latches to save the data eye for providing an infinite persistent data eye. In operation, incoming read data signals are adjusted in the first stage individually and latched to provide the read data to the requesting unit. The data is also simultaneously fed into a balanced XOR tree to combine the transitions of all incoming read data signals into a single signal. This signal is passed along a delay chain and tapped at constant intervals. The tap points are fed into latches, capturing the transitions at a delay element interval resolution. Using XORs, differences between adjacent taps and therefore transitions are detected. The eye is defined by segments that show no transitions over a series of samples. The eye size and position can be used to readjust the delay of incoming signals and/or to control environment parameters like voltage, clock speed and temperature.
Abstract:
A computing system with physically remote shared computer memory, the computing system including: a remote memory management module, a plurality of computing devices, a plurality of remote memory modules that are external to the plurality of computing devices, and a remote memory controller, the remote memory management module configured to partition the physically remote shared computer memory amongst a plurality of computing devices; each computing device including a computer processor and a local memory controller, the local memory controller including: a processor interface, a local memory interface, and a local interconnect interface; each remote memory controller including: a remote memory interface and a remote interconnect interface, wherein the remote memory controller is operatively coupled to the data communications interconnect via the remote interconnect interface such that the remote memory controller is coupled for data communications with the local memory controller over the data communications interconnect.
Abstract:
A computing system with physically remote shared computer memory, the computing system including: a remote memory management module, a plurality of computing devices, a plurality of remote memory modules that are external to the plurality of computing devices, and a remote memory controller, the remote memory management module configured to partition the physically remote shared computer memory amongst a plurality of computing devices; each computing device including a computer processor and a local memory controller, the local memory controller including: a processor interface, a local memory interface, and a local interconnect interface; each remote memory controller including: a remote memory interface and a remote interconnect interface, wherein the remote memory controller is operatively coupled to the data communications interconnect via the remote interconnect interface such that the remote memory controller is coupled for data communications with the local memory controller over the data communications interconnect.
Abstract:
An apparatus and method for providing a data eye monitor. The data eye monitor apparatus utilizes an inverter/latch string circuit and a set of latches to save the data eye for providing an infinite persistent data eye. In operation, incoming read data signals are adjusted in the first stage individually and latched to provide the read data to the requesting unit. The data is also simultaneously fed into a balanced XOR tree to combine the transitions of all incoming read data signals into a single signal. This signal is passed along a delay chain and tapped at constant intervals. The tap points are fed into latches, capturing the transitions at a delay element interval resolution. Using XORs, differences between adjacent taps and therefore transitions are detected. The eye is defined by segments that show no transitions over a series of samples. The eye size and position can be used to readjust the delay of incoming signals and/or to control environment parameters like voltage, clock speed and temperature.
Abstract:
A Multi-Petascale Highly Efficient Parallel Supercomputer of 100 petaOPS-scale computing, at decreased cost, power and footprint, and that allows for a maximum packaging density of processing nodes from an interconnect point of view. The Supercomputer exploits technological advances in VLSI that enables a computing model where many processors can be integrated into a single Application Specific Integrated Circuit (ASIC). Each ASIC computing node comprises a system-on-chip ASIC utilizing four or more processors integrated into one die, with each having full access to all system resources and enabling adaptive partitioning of the processors to functions such as compute or messaging I/O on an application by application basis, and preferably, enable adaptive partitioning of functions in accordance with various algorithmic phases within an application, or if I/O or other processors are underutilized, then can participate in computation or communication nodes are interconnected by a five dimensional torus network with DMA that optimally maximize the throughput of packet communications between nodes and minimize latency.
Abstract:
A method and system are disclosed for detecting memory chip failure in a computer memory system. The method comprises the steps of accessing user data from a set of user data chips, and testing the user data for errors using data from a set of system data chips. This testing is done by generating a sequence of check symbols from the user data, grouping the user data into a sequence of data symbols, and computing a specified sequence of syndromes. If all the syndromes are zero, the user data has no errors. If one of the syndromes is non-zero, then a set of discriminator expressions are computed, and used to determine whether a single or double symbol error has occurred. In the preferred embodiment, less than two full system data chips are used for testing and correcting the user data.
Abstract:
A translation table has entries that each include a share bit and a delta bit, with pointers that point to a memory block that includes reuse bits. The share bit is set to indicate a translation table entry is sharing its memory block with another translation table entry. In addition, a translation table entry may include a private delta in the form of a pointer that references a memory fragment in the memory block that is not shared with other translation table entries, wherein the private delta references previously-stored content. When a translation table has a private delta, its delta bit is set. The private delta is generated by analyzing a data buffer for content that is similar to previously-stored content.