Parallel processing apparatus, and method of maintaining parallel processing apparatus

    公开(公告)号:US11036549B2

    公开(公告)日:2021-06-15

    申请号:US16009274

    申请日:2018-06-15

    Abstract: A parallel processing apparatus includes: a memory; and a processor coupled to the memory, the processor is configured to: acquire a first time and a second time; divide a plurality of nodes into a plurality of groups; generate a plurality of schedule candidates each which assigns time zones corresponding to a length of time used to perform a maintenance operation at one or more nodes included in the plurality of groups in a time period from the first time to the second time to the plurality of groups such that no overlap occurs among the plurality of groups; evaluate the plurality of schedule candidates based on one or more process execution schedules of the one or more nodes in the time period; and output one schedule candidate of the plurality of schedule candidates based on a result of the evaluation.

    Information processing apparatus, stage-out processing method and recording medium recording job management program

    公开(公告)号:US10599472B2

    公开(公告)日:2020-03-24

    申请号:US15898274

    申请日:2018-02-16

    Abstract: An information processing apparatus includes: a processor performs a scheduling process of scheduling a job for nodes and including: calculating, when one node executes a first job, a job execution end time when execution of the first job is completed by referring an execution history in which an execution time of a job is recorded; acquiring, from a load management node that manages a load of a metadata-process execution node which performing metadata processing to access metadata of a file among the nodes, the load of the metadata-process execution node at the job execution end time; and generating, when the load is equal to or more than a threshold, schedule data to cause a staging execution node which performs the metadata processing produced by staging, at the job execution end time, the metadata processing based on staging to a file having an execution result of the first job.

    Information processing apparatus, parallel computer system, and file server communication program

    公开(公告)号:US10367886B2

    公开(公告)日:2019-07-30

    申请号:US15191682

    申请日:2016-06-24

    Abstract: An information processing apparatus, among a plurality of information processing apparatuses that performs parallel computing processing in a parallel computer system, including a memory and a processor coupled to the memory and configured to execute a process including: calculating a centroid position of the information processing apparatuses based on a data length of data for which subsequent reading or writing from or to a file server is requested by the information processing apparatuses and position information on each of the information processing apparatuses; determining a first information processing apparatus that performs data relay according to the calculated centroid position; and collectively receiving or transmitting, when the determined first information processing apparatus that performs data relay is the information processing apparatus, the data for two or more of the information processing apparatuses.

    Apparatus and method to collect memory dump information of a faulty node in a parallel computer system

    公开(公告)号:US10140192B2

    公开(公告)日:2018-11-27

    申请号:US15433405

    申请日:2017-02-15

    Abstract: An apparatus includes nodes each configured to relay data between the nodes. When a failure occurs in a first-node, a management-node determines, based on power consumption and/or memory usage of the nodes, a collection-node that transmits an instruction in first direction approaching the first-node and a second direction approaching a storage-node, respectively. A second-node that is neither an adjacent-node adjacent to the first-node nor the storage-node, upon receiving data including the instruction, transmit data obtained by adding an evaluation value for the second-node to the received data, in the first or second direction. Each of the adjacent-node and the storage-node, upon receiving data including the instruction, transmit data including an evaluation value for another node included in the received data and an evaluation value for the each node, to the collection-node which determines transmission routes between the collection-node and the first-node, and between the collection-node and the storage-node.

    Data transfer control apparatus that control transfer of data between nodes and parallel computing system

    公开(公告)号:US10091280B2

    公开(公告)日:2018-10-02

    申请号:US15007366

    申请日:2016-01-27

    Abstract: A data transfer control apparatus controls transfer of data from a plurality of first nodes included in a first region in a network to a plurality of second nodes included in a second region in the network. A control unit of the data transfer control apparatus generates an n-dimensional Latin hypercube in which the number of symbols in each dimension is a value in keeping with a size of the first region. The control unit then associates, in accordance with respective positions of the first nodes in the first region, each first node with a symbol at a corresponding position in the Latin hypercube. The control unit then instructs the first nodes so that parallel data transfers by a plurality of first node sets, where first nodes associated with a same symbol in the Latin hypercube are grouped, are executed in order in first node set units.

    START TEST METHOD, SYSTEM, AND RECORDING MEDIUM

    公开(公告)号:US20180157566A1

    公开(公告)日:2018-06-07

    申请号:US15805186

    申请日:2017-11-07

    CPC classification number: G06F11/263 G06F11/2247 G06F11/2284

    Abstract: A start test method executed by a system including a calculation device and a management device that manages failure information on the calculation device, the start test method includes storing, by a first processor included in the management device, a failure rate that has been calculated for each of parts of the calculation device based on the failure information received from the calculation device as performance information, associating with time information and a part of the calculation device; obtaining a failure rate of each of the parts at a time of start of the calculation device based on the performance information and a time when the calculation device is to be started; notifying the calculation device of the obtained failure rate; and executing, by a second processor included in the calculation device, a start test of the calculation device in accordance with the notified failure rate.

    APPARATUS AND METHOD TO COLLECT MEMORY DUMP INFORMATION OF A FAULTY NODE IN A PARALLEL COMPUTER SYSTEM

    公开(公告)号:US20170242766A1

    公开(公告)日:2017-08-24

    申请号:US15433405

    申请日:2017-02-15

    Abstract: An apparatus includes nodes each configured to relay data between the nodes. When a failure occurs in a first-node, a management-node determines, based on power consumption and/or memory usage of the nodes, a collection-node that transmits an instruction in first direction approaching the first-node and a second direction approaching a storage-node, respectively. A second-node that is neither an adjacent-node adjacent to the first-node nor the storage-node, upon receiving data including the instruction, transmit data obtained by adding an evaluation value for the second-node to the received data, in the first or second direction. Each of the adjacent-node and the storage-node, upon receiving data including the instruction, transmit data including an evaluation value for another node included in the received data and an evaluation value for the each node, to the collection-node which determines transmission routes between the collection-node and the first-node, and between the collection-node and the storage-node.

    JOB MANAGEMENT METHOD AND JOB MANAGING DEVICE
    30.
    发明申请
    JOB MANAGEMENT METHOD AND JOB MANAGING DEVICE 有权
    作业管理方法和作业管理装置

    公开(公告)号:US20160217007A1

    公开(公告)日:2016-07-28

    申请号:US14976391

    申请日:2015-12-21

    Abstract: A device includes: a memory; and a processor coupled to the memory and configured to execute a process of managing data on a first subgraph that is included in a graph including vertices indicating computing resources of a system and edges indicating links between the computing resources and is provided for a first computing resource to which a first job are assigned, or data on a second subgraph that is included in the graph and connected to the first subgraph through a vertex indicating a computing resource to which none of the first job is assigned in the graph and that is provided for a second computing resource to which a second job is assigned, and a process of using the data to determine, based on the first subgraph, whether a third computing resource to which a third job is to be assigned exists.

    Abstract translation: 一种设备包括:存储器; 以及处理器,其耦合到所述存储器并且被配置为执行在包括在包括指示系统的计算资源的顶点的图形中的第一子图上管理数据的过程和指示所述计算资源之间的链接的边缘,并且被提供给第一计算资源 第一作业被分配给第一作业的第二子图的数据,或通过表示第一作业在图中没有赋予的计算资源的顶点连接到第一子图的第二子图上的数据,并且被提供给 分配第二作业的第二计算资源,以及使用该数据的步骤,基于第一子图确定是否存在要分配给第三作业的第三计算资源。

Patent Agency Ranking