Workload management system and process

    公开(公告)号:US10521260B2

    公开(公告)日:2019-12-31

    申请号:US15650357

    申请日:2017-07-14

    Abstract: A high performance computing (HPC) system has an architecture that separates data paths used by compute nodes exchanging computational data from the data paths used by compute nodes to obtain computational work units and save completed computations. The system enables an improved method of saving checkpoint data, and an improved method of using an analysis of the saved data to assign particular computational work units to particular compute nodes. The system includes a compute fabric and compute nodes that cooperatively perform a computation by mutual communication using the compute fabric. The system also includes a local data fabric that is coupled to the compute nodes, a memory, and a data node. The data node is configured to retrieve data for the computation from an external bulk data storage, and to store its work units in the memory for access by the compute nodes.

    Virtual channel and resource assignment

    公开(公告)号:US10331581B2

    公开(公告)日:2019-06-25

    申请号:US15483880

    申请日:2017-04-10

    Abstract: A high-performance computing system, method, and storage medium manage accesses to multiple memory modules of a computing node, the modules having different access latencies. The node allocates its resources into pools according to pre-determined memory access criteria. When another computing node requests a memory access, the node determines whether the request satisfies any of the criteria. If so, the associated pool of resources is selected for servicing the request; if not, a default pool is selected. The node then services the request if the pool of resources is sufficient. Otherwise, various error handling processes are performed. Each memory access criterion may relate to a memory address range assigned to a memory module, a type of request, a relationship between the nodes, a configuration of the requesting node, or a combination of these.

    Virtual Channel and Resource Assignment
    3.
    发明申请

    公开(公告)号:US20180293184A1

    公开(公告)日:2018-10-11

    申请号:US15483880

    申请日:2017-04-10

    Abstract: A high-performance computing system, method, and storage medium manage accesses to multiple memory modules of a computing node, the modules having different access latencies. The node allocates its resources into pools according to pre-determined memory access criteria. When another computing node requests a memory access, the node determines whether the request satisfies any of the criteria. If so, the associated pool of resources is selected for servicing the request; if not, a default pool is selected. The node then services the request if the pool of resources is sufficient. Otherwise, various error handling processes are performed. Each memory access criterion may relate to a memory address range assigned to a memory module, a type of request, a relationship between the nodes, a configuration of the requesting node, or a combination of these.

Patent Agency Ranking