摘要:
Persistent data storage with low latency is provided by a method that includes receiving a low latency store command that includes write data. The write data is written to a first memory device that is implemented by a nonvolatile solid-state memory technology characterized by a first access speed. It is acknowledged that the write data has been successfully written to the first memory device. The write data is written to a second memory device that is implemented by a volatile memory technology. At least a portion of the data in the first memory device is written to a third memory device when a predetermined amount of data has been accumulated in the first memory device. The third memory device is implemented by a nonvolatile solid-state memory technology characterized by a second access speed that is slower than the first access speed.
摘要:
Persistent data storage is provided by a method that includes receiving a low latency store command that includes write data. The write data is written to a first memory device that is implemented by a nonvolatile solid-state memory technology characterized by a first access speed. It is acknowledged that the write data has been successfully written to the first memory device. The write data is written to a second memory device that is implemented by a volatile memory technology. At least a portion of the data in the first memory device is written to a third memory device when a predetermined amount of data has been accumulated in the first memory device. The third memory device is implemented by a nonvolatile solid-state memory technology characterized by a second access speed that is slower than the first access speed.
摘要:
A data capture technique for high speed signaling to allow for optimal sampling of an asynchronous data stream. This technique allows for extremely high data rates and does not require that a clock be sent with the data as is done in source synchronous systems. The present invention also provides a hardware mechanism for automatically adjusting transmission delays for optimal two-bit simultaneous bi-directional (SiBiDi) signaling.
摘要:
Automatic selection and establishment of a communications connection between a telephone device to a receiver device, including entering an address of a receiver device into the telephone device for initiating the communications connection to the receiver device, and automatically selecting a communications network for establishing the communications connection to the receiver device, and selecting the communications network from an internet-based network, a hybrid telephone/internet network, and a telephone network. Automatically determine network access capabilities of the receiver device based on the address of the receiver device, and automatically evaluate the cost of establishing a communications connection for each of the communications networks which the receiver device is capable of accessing. The communications network with the lowest cost is selected.
摘要:
A novel massively parallel supercomputer of hundreds of teraOPS-scale includes node architectures based upon System-On-a-Chip technology, i.e., each processing node comprises a single Application Specific Integrated Circuit (ASIC). Within each ASIC node is a plurality of processing elements each of which consists of a central processing unit (CPU) and plurality of floating point processors to enable optimal balance of computational performance, packaging density, low cost, and power and cooling requirements. The plurality of processors within a single node individually or simultaneously work on any combination of computation or communication as required by the particular algorithm being solved. The system-on-a-chip ASIC nodes are interconnected by multiple independent networks that optimally maximizes packet communications throughput and minimizes latency. The multiple networks include three high-speed networks for parallel algorithm message passing including a Torus, Global Tree, and a Global Asynchronous network that provides global barrier and notification functions.
摘要:
A system and method for enabling high-speed, low-latency global collective communications among interconnected processing nodes. The global collective network optimally enables collective reduction operations to be performed during parallel algorithm operations executing in a computer structure having a plurality of the interconnected processing nodes. Router devices are included that interconnect the nodes of the network via links to facilitate performance of low-latency global processing operations at nodes of the virtual network. The global collective network may be configured to provide global barrier and interrupt functionality in asynchronous or synchronized manner. When implemented in a massively-parallel supercomputing structure, the global collective network is physically and logically partitionable according to needs of a processing algorithm.
摘要:
A system and method for enabling high-speed, low-latency global collective communications among interconnected processing nodes. The global collective network optimally enables collective reduction operations to be performed during parallel algorithm operations executing in a computer structure having a plurality of the interconnected processing nodes. Router devices ate included that interconnect the nodes of the network via links to facilitate performance of low-latency global processing operations at nodes of the virtual network and class structures. The global collective network may be configured to provide global barrier and interrupt functionality in asynchronous or synchronized manner. When implemented in a massively-parallel supercomputing structure, the global collective network is physically and logically partitionable according to needs of a processing algorithm.
摘要:
A system, method and computer program product for supporting system initiated checkpoints in high performance parallel computing systems and storing of checkpoint data to a non-volatile memory storage device. The system and method generates selective control signals to perform checkpointing of system related data in presence of messaging activity associated with a user application running at the node. The checkpointing is initiated by the system such that checkpoint data of a plurality of network nodes may be obtained even in the presence of user applications running on highly parallel computers that include ongoing user messaging activity. In one embodiment, the non-volatile memory is a pluggable flash memory card.
摘要:
A Multi-Petascale Highly Efficient Parallel Supercomputer of 100 petaOPS-scale computing, at decreased cost, power and footprint, and that allows for a maximum packaging density of processing nodes from an interconnect point of view. The Supercomputer exploits technological advances in VLSI that enables a computing model where many processors can be integrated into a single Application Specific Integrated Circuit (ASIC). Each ASIC computing node comprises a system-on-chip ASIC utilizing four or more processors integrated into one die, with each having full access to all system resources and enabling adaptive partitioning of the processors to functions such as compute or messaging I/O on an application by application basis, and preferably, enable adaptive partitioning of functions in accordance with various algorithmic phases within an application, or if I/O or other processors are underutilized, then can participate in computation or communication nodes are interconnected by a five dimensional torus network with DMA that optimally maximize the throughput of packet communications between nodes and minimize latency.
摘要翻译:具有100 petaOPS规模计算的多Petascale高效并行超级计算机,其成本,功耗和占地面积都在降低,并且允许从互连角度来看处理节点的最大封装密度。 超级计算机利用了VLSI的技术进步,实现了许多处理器可以集成到单个专用集成电路(ASIC)中的计算模型。 每个ASIC计算节点包括利用集成到一个管芯中的四个或更多个处理器的片上系统ASIC,每个处理器具有对所有系统资源的完全访问,并且使得处理器能够对诸如计算或消息传递I / O 并且优选地,根据应用内的各种算法阶段实现功能的自适应分割,或者如果I / O或其他处理器未被充分利用,则可以参与计算或通信节点通过五维环面网络互连 使用DMA来最大限度地最大化节点之间的分组通信的吞吐量并最小化等待时间。
摘要:
Embodiments of the invention provide a method, system and computer program product for embedding a global barrier and global interrupt network in a parallel computer system organized as a torus network. The computer system includes a multitude of nodes. In one embodiment, the method comprises taking inputs from a set of receivers of the nodes, dividing the inputs from the receivers into a plurality of classes, combining the inputs of each of the classes to obtain a result, and sending said result to a set of senders of the nodes. Embodiments of the invention provide a method, system and computer program product for embedding a collective network in a parallel computer system organized as a torus network. In one embodiment, the method comprises adding to a torus network a central collective logic to route messages among at least a group of nodes in a tree structure.