摘要:
A system and method for accessing memory are provided. The system comprises a lookup buffer for storing one or more page table entries, wherein each of the one or more page table entries comprises at least a virtual page number and a physical page number; a logic circuit for receiving a virtual address from said processor, said logic circuit for matching the virtual address to the virtual page number in one of the page table entries to select the physical page number in the same page table entry, said page table entry having one or more bits set to exclude a memory range from a page.
摘要:
A system and method for accessing memory are provided. The system comprises a lookup buffer for storing one or more page table entries, wherein each of the one or more page table entries comprises at least a virtual page number and a physical page number; a logic circuit for receiving a virtual address from said processor, said logic circuit for matching the virtual address to the virtual page number in one of the page table entries to select the physical page number in the same page table entry, said page table entry having one or more bits set to exclude a memory range from a page.
摘要:
A system and method for accessing memory are provided. The system comprises a lookup buffer for storing one or more page table entries, wherein each of the one or more page table entries comprises at least a virtual page number and a physical page number; a logic circuit for receiving a virtual address from said processor, said logic circuit for matching the virtual address to the virtual page number in one of the page table entries to select the physical page number in the same page table entry, said page table entry having one or more bits set to exclude a memory range from a page.
摘要:
A system and method for accessing memory are provided. The system comprises a lookup buffer for storing one or more page table entries, wherein each of the one or more page table entries comprises at least a virtual page number and a physical page number; a logic circuit for receiving a virtual address from said processor, said logic circuit for matching the virtual address to the virtual page number in one of the page table entries to select the physical page number in the same page table entry, said page table entry having one or more bits set to exclude a memory range from a page.
摘要:
A Multi-Petascale Highly Efficient Parallel Supercomputer of 100 petaOPS-scale computing, at decreased cost, power and footprint, and that allows for a maximum packaging density of processing nodes from an interconnect point of view. The Supercomputer exploits technological advances in VLSI that enables a computing model where many processors can be integrated into a single Application Specific Integrated Circuit (ASIC). Each ASIC computing node comprises a system-on-chip ASIC utilizing four or more processors integrated into one die, with each having full access to all system resources and enabling adaptive partitioning of the processors to functions such as compute or messaging I/O on an application by application basis, and preferably, enable adaptive partitioning of functions in accordance with various algorithmic phases within an application, or if I/O or other processors are underutilized, then can participate in computation or communication nodes are interconnected by a five dimensional torus network with DMA that optimally maximize the throughput of packet communications between nodes and minimize latency.
摘要翻译:具有100 petaOPS规模计算的多Petascale高效并行超级计算机,其成本,功耗和占地面积都在降低,并且允许从互连角度来看处理节点的最大封装密度。 超级计算机利用了VLSI的技术进步,实现了许多处理器可以集成到单个专用集成电路(ASIC)中的计算模型。 每个ASIC计算节点包括利用集成到一个管芯中的四个或更多个处理器的片上系统ASIC,每个处理器具有对所有系统资源的完全访问,并且使得处理器能够对诸如计算或消息传递I / O 并且优选地,根据应用内的各种算法阶段实现功能的自适应分割,或者如果I / O或其他处理器未被充分利用,则可以参与计算或通信节点通过五维环面网络互连 使用DMA来最大限度地最大化节点之间的分组通信的吞吐量并最小化等待时间。
摘要:
A control logic device performs a local rollback in a parallel super computing system. The super computing system includes at least one cache memory device. The control logic device determines a local rollback interval. The control logic device runs at least one instruction in the local rollback interval. The control logic device evaluates whether an unrecoverable condition occurs while running the at least one instruction during the local rollback interval. The control logic device checks whether an error occurs during the local rollback. The control logic device restarts the local rollback interval if the error occurs and the unrecoverable condition does not occur during the local rollback interval.
摘要:
A control logic device performs a local rollback in a parallel super computing system. The super computing system includes at least one cache memory device. The control logic device determines a local rollback interval. The control logic device runs at least one instruction in the local rollback interval. The control logic device evaluates whether an unrecoverable condition occurs while running the at least one instruction during the local rollback interval. The control logic device checks whether an error occurs during the local rollback. The control logic device restarts the local rollback interval if the error occurs and the unrecoverable condition does not occur during the local rollback interval.
摘要:
A Multi-Petascale Highly Efficient Parallel Supercomputer of 100 petaOPS-scale computing, at decreased cost, power and footprint, and that allows for a maximum packaging density of processing nodes from an interconnect point of view. The Supercomputer exploits technological advances in VLSI that enables a computing model where many processors can be integrated into a single Application Specific Integrated Circuit (ASIC). Each ASIC computing node comprises a system-on-chip ASIC utilizing four or more processors integrated into one die, with each having full access to all system resources and enabling adaptive partitioning of the processors to functions such as compute or messaging I/O on an application by application basis, and preferably, enable adaptive partitioning of functions in accordance with various algorithmic phases within an application, or if I/O or other processors are underutilized, then can participate in computation or communication nodes are interconnected by a five dimensional torus network with DMA that optimally maximize the throughput of packet communications between nodes and minimize latency.
摘要翻译:具有100 petaOPS规模计算的多Petascale高效并行超级计算机,其成本,功耗和占地面积都在降低,并且允许从互连角度来看处理节点的最大封装密度。 超级计算机利用了VLSI的技术进步,实现了许多处理器可以集成到单个专用集成电路(ASIC)中的计算模型。 每个ASIC计算节点包括利用集成到一个管芯中的四个或更多个处理器的片上系统ASIC,每个处理器具有对所有系统资源的完全访问,并且使得处理器能够对诸如计算或消息传递I / O 并且优选地,根据应用内的各种算法阶段实现功能的自适应分割,或者如果I / O或其他处理器未被充分利用,则可以参与计算或通信节点通过五维环面网络互连 使用DMA来最大限度地最大化节点之间的分组通信的吞吐量并最小化等待时间。
摘要:
A method for passing remote messages in a parallel computer system formed as a network of interconnected compute nodes includes that a first compute node (A) sends a single remote message to a remote second compute node (B) in order to control the remote second compute node (B) to send at least one remote message. The method includes various steps including controlling a DMA engine at first compute node (A) to prepare the single remote message to include a first message descriptor and at least one remote message descriptor for controlling the remote second compute node (B) to send at least one remote message, including putting the first message descriptor into an injection FIFO at the first compute node (A) and sending the single remote message and the at least one remote message descriptor to the second compute node (B).
摘要:
A method for passing remote messages in a parallel computer system formed as a network of interconnected compute nodes includes that a first compute node (A) sends a single remote message to a remote second compute node (B) in order to control the remote second compute node (B) to send at least one remote message. The method includes various steps including controlling a DMA engine at first compute node (A) to prepare the single remote message to include a first message descriptor and at least one remote message descriptor for controlling the remote second compute node (B) to send at least one remote message, including putting the first message descriptor into an injection FIFO at the first compute node (A) and sending the single remote message and the at least one remote message descriptor to the second compute node (B).