-
公开(公告)号:US10521260B2
公开(公告)日:2019-12-31
申请号:US15650357
申请日:2017-07-14
Applicant: Hewlett Packard Enterprise Development LP
Inventor: Steven J. Dean , Michael Woodacre , Randal S. Passint , Eric C. Fromm , Thomas E. McGee , Michael E. Malewicki , Kirill Malkin
Abstract: A high performance computing (HPC) system has an architecture that separates data paths used by compute nodes exchanging computational data from the data paths used by compute nodes to obtain computational work units and save completed computations. The system enables an improved method of saving checkpoint data, and an improved method of using an analysis of the saved data to assign particular computational work units to particular compute nodes. The system includes a compute fabric and compute nodes that cooperatively perform a computation by mutual communication using the compute fabric. The system also includes a local data fabric that is coupled to the compute nodes, a memory, and a data node. The data node is configured to retrieve data for the computation from an external bulk data storage, and to store its work units in the memory for access by the compute nodes.
-
公开(公告)号:US10331581B2
公开(公告)日:2019-06-25
申请号:US15483880
申请日:2017-04-10
Applicant: HEWLETT PACKARD ENTERPRISE DEVELOPMENT LP
Inventor: Frank R. Dropps , Michael E. Malewicki
Abstract: A high-performance computing system, method, and storage medium manage accesses to multiple memory modules of a computing node, the modules having different access latencies. The node allocates its resources into pools according to pre-determined memory access criteria. When another computing node requests a memory access, the node determines whether the request satisfies any of the criteria. If so, the associated pool of resources is selected for servicing the request; if not, a default pool is selected. The node then services the request if the pool of resources is sufficient. Otherwise, various error handling processes are performed. Each memory access criterion may relate to a memory address range assigned to a memory module, a type of request, a relationship between the nodes, a configuration of the requesting node, or a combination of these.
-
公开(公告)号:US20180293184A1
公开(公告)日:2018-10-11
申请号:US15483880
申请日:2017-04-10
Applicant: HEWLETT PACKARD ENTERPRISE DEVELOPMENT LP
Inventor: Frank R. Dropps , Michael E. Malewicki
CPC classification number: G06F13/161 , G06F13/1663 , G06F13/1673 , G06F13/4022 , G06F13/4068
Abstract: A high-performance computing system, method, and storage medium manage accesses to multiple memory modules of a computing node, the modules having different access latencies. The node allocates its resources into pools according to pre-determined memory access criteria. When another computing node requests a memory access, the node determines whether the request satisfies any of the criteria. If so, the associated pool of resources is selected for servicing the request; if not, a default pool is selected. The node then services the request if the pool of resources is sufficient. Otherwise, various error handling processes are performed. Each memory access criterion may relate to a memory address range assigned to a memory module, a type of request, a relationship between the nodes, a configuration of the requesting node, or a combination of these.
-
公开(公告)号:US20180018196A1
公开(公告)日:2018-01-18
申请号:US15650357
申请日:2017-07-14
Applicant: Hewlett Packard Enterprise Development LP
Inventor: Steven J. Dean , Michael Woodacre , Randal S. Passint , Eric C. Fromm , Thomas E. McGee , Michael E. Malewicki , Kirill Malkin
CPC classification number: G06F9/45558 , G06F9/50 , G06F9/5077 , G06F9/54 , G06F2009/45595 , G06Q10/06 , H04L29/08315 , H04L67/1042
Abstract: A high performance computing (HPC) system has an architecture that separates data paths used by compute nodes exchanging computational data from the data paths used by compute nodes to obtain computational work units and save completed computations. The system enables an improved method of saving checkpoint data, and an improved method of using an analysis of the saved data to assign particular computational work units to particular compute nodes. The system includes a compute fabric and compute nodes that cooperatively perform a computation by mutual communication using the compute fabric. The system also includes a local data fabric that is coupled to the compute nodes, a memory, and a data node. The data node is configured to retrieve data for the computation from an external bulk data storage, and to store its work units in the memory for access by the compute nodes.
-
-
-