Abstract:
A parallel processing apparatus includes a memory and a processor. The memory stores a program and the processor is coupled to the memory. The processor calculates, based on a number of nodes to be used in execution of respective jobs that are waiting to be executed and a scheduled execution time period for execution of the respective jobs, an execution scale of the respective jobs and allocates the respective jobs to an area in which a number of problem nodes that have a high failure possibility is small from among a plurality of areas into which a region in which a plurality of nodes are disposed is partitioned and divided. The allocation of the jobs is performed in descending order of the execution scale beginning with the job whose execution scale is the largest.
Abstract:
Computers are connected via multidimensional mesh or torus connection in a network. In response to a request for executing maintenance processing on computers in the network, an apparatus detects execution-scheduled jobs to be executed after an execution start time of the maintenance processing, based on execution-scheduled start times of jobs to be executed by the computers. The apparatus calculates, for each execution-scheduled job, a characteristic value of an axial length of an execution-scheduled job area in each axial direction of multidimensional axes in the network, where the execution-scheduled job area includes a group of computers to execute the each execution-scheduled job. The apparatus determines a maintenance area in the network on which the maintenance processing is to be executed, based on the characteristic values of the axial lengths of the execution-scheduled job areas, and executes the maintenance processing on computers in the maintenance area.
Abstract:
A first node determines a second node belonging to the same first group as the first node, and creates a first receive buffer corresponding to the second node in a memory. The first node determines a third and a fourth node belonging to a second group, and creates a second receive buffer corresponding to the third node in the memory, without creating a receive buffer corresponding to the fourth node. The first node uses the first receive buffer to receive messages when communicating with the second node, uses the second receive buffer to receive messages when communicating with the third node, and uses the first receive buffer or the second receive buffer to receive messages when communicating with the fourth node.
Abstract:
An information gathering system includes: an ID gathering mechanism, provided at a transmitter side node, that generates a collective identifier from one or a plurality of individual identifiers of respective management targets, each of the individual identifiers being generated according to a state of a corresponding management target; and an ID analysis mechanism, provided at a receiver side node, that restores an individual identifier from the collective identifier and specifies a management target based on the restored individual identifier.
Abstract:
An apparatus includes a shared cache memory and a controller. The shared cache memory is configured to be divided into sectors by assigning one or more ways to each sector in accordance with a reusability level of data. The controller changes a sector division ratio indicating a ratio between way counts of the divided sectors of the shared cache memory, where the way count is a number of ways assigned to each sector. When first and second jobs are being executed in parallel, in response to a designation of a program of the second job, the controller calculates the sector division ratio, based on data access amount including a size and an access count of data accessed by the first and second jobs and a volume of the shared cache memory, and changes the sector division ratio of the shared cache memory to the calculated sector division ratio.
Abstract:
A compiler method that performs parallel processing on a data set using multithreading. The method includes calculating a divisor for dividing the data set. The data set is divided into a number of subsets greater than a number of threads. The method generates a plurality of data subsets and executable code. The code performs processing operations and an instruction executed by a first thread that reaches the code. After completing processing operations related to the subsets that have been assigned to the threads, the next subsets are assigned to the threads. When assigning the next subsets, synchronous processing is performed in order to determine which one of “unprocessed”, “processed”, and “assigned to a different thread” is the state of each of the subsets.
Abstract:
An apparatuses includes a processor, a storage unit, and a communication unit to access the storage unit without intermediary of the processor and to access a second apparatus of the plurality of information processing apparatuses via a communication unit of the second apparatus. The communication unit of a first apparatus of the plurality of information processing apparatuses executes at least one of a process of storing redundant data which is generated by making redundant data stored in the storage unit of the first apparatus in the storage unit of the second apparatus via the communication unit of the second apparatus, and a process of acquiring redundant data which is generated by making redundant data stored in the storage unit of the second apparatus via the communication unit of the second apparatus, and storing the acquired data in the storage unit of the first apparatus.
Abstract:
An information processing apparatus, includes a processor, and memory storing instructions for causing the processor to allocate a node in an information processing system including the information processing apparatus as an investigation node configured to perform an investigation of data stored in memory of a node of which an error is detected. The processor further instructs the investigation node to acquire data to be investigated from the node of which the error is detected, instructs the investigation node to perform an operation for determining whether a predetermined value in the acquired data is a normal value, and determines that a failure occurs in the node of which the error is detected when the predetermined value is not a normal value.
Abstract:
A system to which the present invention has been applied includes a plurality of information processing apparatuses connected to each other and a management device that divides a first number of pieces of management data needed for management of the plurality of information processing apparatuses into a second number of pieces of management data, the second number being equal to or greater than the first number, and that transmits the second number of pieces of management data obtained by the division respectively to the plurality of information processing apparatuses.
Abstract:
An information processing apparatus is configured to receive a request for communication between a first node and a second node included in a parallel calculation system, acquire job execution information relating a job to be executed by the parallel calculation system, generate connected graph information based on first information on the first node, second information on the second node, the job execution information, and topology information indicating a topology of the plurality of nodes, generate, based on the connected graph information, route information indicating a plurality of routes used when the communication between the first node and the second node is executed, specify, based on the route information, a route having the lowest passing cost among the plurality of routes; and specify a node included in the specified route as a relay node based on positions of the plurality of nodes in the specified route.