摘要:
According to the present invention, even when any one of the computers constituting a computer system becomes inoperable, a series of flow of distributed processing can be traced. Application program 120 in computer 100A requests computer 100B to perform a part of the series of the processes (S106). Log information items in the computer 100A, to which an ID issued to a series of processes is attached, are transmitted to the computer 100B by the log transmission program 152 in the computer 100A (S110). The log information receiving program 153 of the computer 100B receives these log information items and stores them in logbuffer 160 (S221, S222). The application program 120 in the computer 100B executes thus requested processing, and returns the process result to the computer 100A (S206). The log information items of the computer 100B each having the above ID, generated by executing the processing, are transmitted to the computer 100A by the log transmission program 152 of the computer 100B (S210) , and the log information receiving program 153 of the computer 100A receives this log information items and stores them in the log buffer 160 (S121, S222).
摘要:
A storage system is disclosed comprising a management computer and multiple storage subsystems with multiple volumes accessible to a host computer. The host computer sends volume configuration information to the management computer. Each storage subsystem retrieves information from the volume directory corresponding to a requested volume and sends it to the management computer. The management computer comprises a volume information table holding the volume configuration information obtained from the host computer, information on storage subsystems, and use-related ID group IDs; a unit which identifies the volumes belonging to a specified group ID, locates the storage subsystems these volumes belong to, and obtains information on the desired volumes from their corresponding volume directories; and a display device which displays information on the usage of the volumes that is obtained from storage subsystems. Thus, volume information in multiple storage subsystems can be collected in an integrated manner, thereby reducing management overhead.
摘要:
A dump information acquiring method and a system recovery method for a computer system using a virtual memory, the methods being used when fault occurs in the computer system. The computer system has a plurality of external storage units used as a paging device unit for holding the contents of the virtual memory to be paged out from a main memory. During the normal operation, an external storage unit selected from the plurality of external storage units is used as the paging device unit. When fault occurs, a program for acquiring dump information operates to output the contents of the main storage at the time of the fault occurrence to a dump file. The identifier of the external storage unit set to definition information for defining the external storage unit used as the paging device unit is changed from the identifier of the external storage unit used as the page device unit to the identifier of another external storage unit. Thereafter, the computer system is restarted. The contents of the virtual memory held in the paging device unit at the time of the fault occurrence are output to the dump file by a program executed in a background of a job process of the current computer system or by a program executed by another computer system.
摘要:
According to the present invention, as regards a computer system that uses virtual storage management, because content of auxiliary storage utilized by paging is used, dump information can be obtained by outputting only a part of the content of the auxiliary storage at the time of occurrence of an abnormal system condition. As a result, time required to restart the computer system and business can be shortened. What is more, because it is not necessary to add a special external storage for obtaining dump information, computer resources can be utilized more efficiently.
摘要:
A storage system is disclosed comprising a management computer and multiple storage subsystems with multiple volumes accessible to a host computer. The host computer sends volume configuration information to the management computer. Each storage subsystem retrieves information from the volume directory corresponding to a requested volume and sends it to the management computer. The management computer comprises a volume information table holding the volume configuration information obtained from the host computer, information on storage subsystems, and use-related ID group IDs; a unit which identifies the volumes belonging to a specified group ID, locates the storage subsystems these volumes belong to, and obtains information on the desired volumes from their corresponding volume directories; and a display device which displays information on the usage of the volumes that is obtained from storage subsystems. Thus, volume information in multiple storage subsystems can be collected in an integrated manner, thereby reducing management overhead.
摘要:
Dump information acquiring and system recovery methods for a computer system using a virtual memory. The computer system has a plurality of external storage units normally used as a paging device unit for holding the contents of the virtual memory to be paged out. When fault occurs, a program for acquiring dump information outputs the contents of the main storage to a dump file. The identifier of the external storage unit set to definition information defining the external storage unit is changed from the identifier of the external storage unit to the identifier of another external storage unit. Thereafter, the computer system is restarted. The contents of the virtual memory held in the paging device unit at the time of fault output to the dump file.
摘要:
A load sharing method for a parallel computer system having a computer group including a plurality of computers and an operation management mechanism which is a computer for managing the operation of the computer group. The method shares a load for executing a plurality of kinds of work processes to the plurality of computers in the computer group, and includes the steps of setting resource utilization target values by work for the plurality of computers in the computer group; collecting resource utilization states by work for the plurality of computers in the computer group to thereby inform the operation management mechanism of the resource utilization states; selecting a computer to execute a newly requested work process from the plurality of computers in the computer group on the basis of the differences between resource utilization target parameter values by work in the plurality of computers in the computer group and current values of a parameter indicating the reporting resource utilization states by work; and executing, in the selected computer, the newly requested work process
摘要:
A job scheduling analysis method and system are disclosed in which a job schedule is analyzed by use of historical job execution data in a computer system in which a plurality of jobs are executed in parallel. Historical execution data of a plurality of jobs and the file names of files accessed by the jobs are collected. The maximum multiplicity of jobs capable of operating in parallel on the computer system is inputted. When the file name of a file accessed by one job and the file name of a file accessed by another other job coincide with each other, an execution start condition of the plurality of jobs are determined to execute the one job and the other job at the earliest instants within the maximum job multiplicity so that the sequence of execution of processings by the one job and the other job is maintained and the execution time of the one job and the execution time of said other job do not overlap each other. Thereby, it is possible to simulate the influence of a change in system construction and to search for an effective batch processing with a reduced number of idle spaces.
摘要:
A technique for identifying a fundamental cause of a system fault in a system in which a plurality of OSs run on one computer. In a system in which each of a plurality of operating systems executes a process by time-sharing hardware of one computer, a plurality of related OSs are stored in advance in association with each other, in a storage means. When a fault is detected in an OS and the storage means stores an OS associated with the OS in which the fault has occurred, a memory dump is performed for the OS in which the fault has occurred and the OS associated with that OS.
摘要:
A computer system supports the application of a function supplied by OS or utility program. A generating section generates system operation information including job execution history information and file access history information. In accordance with the system operation information, a determining section determines a job or job step to which the function can be applied. In response to a notice from the determining section, a converting section converts an original job control program or job into new job control programs or jobs, and outputs the new job control programs or jobs.