摘要:
A system and method is provided for generating a summary dump file from a system or application crash dump or core dump file without the need for referencing a large symbol table file. A crash dump file with a referencing portion containing references to certain pertinent information ( e.g. , data structures) including references conventionally not found in crash dump files. The data structures referenced in the referencing portion have been found to be optimal for analyzing faults residing in a crash dump file. The crash dump file may be a complete crash dump file of an operating system or a kernel memory dump. Alternatively, the crash dump file may be a crash dump file of an application program. A stand alone extraction tool is also provided for extracting pertinent information from the crash dump or core dump file by utilizing information in the referencing portion. The stand alone tool generates a summary or mini dump file of the crash dump file.
摘要:
A storage system includes a storage controller (20) connected to higher-level devices (40, 30, 10) and a plurality of storages (50-53) connected to the storage controller for storing data from the higher-level devices. The storage controller (20) includes a channel controller (21) for establishing interface for the higher-level devices, the channel controller including trace information representing details of the interface, and storages (23, 50-53) for storing the trace information from the channel controller in a format which can be accessed by the higher-level devices. In this configuration, when the channel controller receives a trace information fetching indication from one of the higher-level devices, the channel controller transfers trace information to a cache memory (23) and the storages (50-53) or to the cache memory or the storages.
摘要:
The present invention discloses system and method for analyzing continuous parameter data from a malfunctioning locomotive or other large land-based, self-powered transport equipment. The method allows for receiving new continuous parameter data (232) comprising a plurality of anomaly definitions from the malfunctioning equipment. The method further allows for selecting a plurality of distinct anomaly definitions (233) from the new continuous parameter data. Respective generating steps allow for generating at least one distinct anomaly definition cluster (236) from the plurality of distinct anomaly definitions and for generating a plurality of weighted repair and distinct anomaly definition cluster combinations. An identifying step allows for identifying at least one repair (238) for the at least one distinct anomaly definition cluster using the plurality of weighted repair and distinct anomaly definition cluster combinations.
摘要:
A circuit pack self-testing system adapted to carry out tests on circuit pack electronic devices is disclosed. The self-testing system executes various test programs in a test suite, and keeps an historical record of test results from previous test suites in non-volatile memory. The historical record is updated only when the most recent results are different than the last-recorded results, and are easily accessible for circuit pack failure analysis and repair. At the beginning of each test suite, a temporary record for containing test results is initialized. As the system progresses through the various test programs, the test programs update the temporary record with test results. In preferred embodiment, this temporary record is created by utilizing two registers, a start register and an end register. The start register stores the beginning of each test program (test portion) in a test suite and the end register stores the ending of each test program. If a test suite runs to completion the start and end registers will contain the same value. However, if a test program in a test suite cannot run to completion, the value stored in the start register will be one greater than the value stored in the end register. Should a fault, possibly intermittent, on the circuit pack under test cause a test program to halt or hang, a sanity-recovery mechanism causes the self-testing system to be restarted. Prior to initializing the temporary record at the beginning of the next test suite, the temporary record is examined to determine whether a previously executed test suite failed to run to completion. If the examination shows that it failed to run to completion, and the failure information is different than the last-recorded results in the historical record, then information from the temporary record indicating this failure is stored in the historical record.
摘要:
A fault tolerant computer system includes at least two central processing units each having a cache memory and a parity error detector adapted to sense parity errors in blocks of information read from and write to cache and to issue a cache parity read or write error flag if a parity error is sensed. A system bus couples the CPU to a System Control Unit having a parity error correction facility, and a memory bus couples the SCU to a main memory. An error recovery control feature distributed across the CPU, a Service Processor and the operating system software is responsive to the sensing of a read parity error flag in a sending CPU and a write parity error flag in a receiving CPU in conjunction with a siphon operation for transferring the faulting block from the sending CPU to main memory via the SCU (in which given faulting block is corrected) and for subsequently transferring the corrected memory block from main memory to the receiving CPU when a retry is instituted.
摘要:
In a logical unit provided with a plurality of internal registers (12 to 15; 20), an internal memory (11-1, 11-2; 21) and a combinational circuit (16), such as an arithmetic unit, at least one (13 or 14; 20) of the plurality of internal registers is arranged to be scanned in and out. During diagnosis, when executing an instruction which makes reference to the internal memory (11-1, 11-2; 21), the register than can be scanned in and scanned out is used in place of the internal memory 11-1, 11-2; 21) for diagnosing the combinational circuit (16).
摘要:
A mechanism for continually testing a floating point accelerator processor (FPAP) element or other processor element in a suitable multiprocessor system. At least two processors, such as an instruction execution processor(EU) and a FPAP, are connected to a common input bus to concurrently receive the same information (opcodes and operands). Both the EU and the FPAP decode the opcodes. When the FPAP decodes an opcode for an operation to be performed by the EU, the FPAP, instead of remaining idle while the EU operates, executes a diagnostic operation. The FPAP selects the particular diagnostic operation to perform in each instance from among a multiplicity of available diagnostic operations. The selection of a diagnostic operation is dependent on the instruction to be executed by the EU; in order to not slow down the overall execution rate of the system, a diagnostic operation is chosen whose execution time is matched to the execution time of the instruction being performed by the EU; that is, a diagnostic operation is selected such that the FPAP will finish the operation before the EU will finish executing its instruction. Operand data supplied to the EU on the input bus is used by the diagnostic operations, to add a degree of randomness to the test signals and permit detection of bits forced to a steady value of zero or one. For some diagnostic operations, one or more variables may be obtained from general purpose registers.
摘要:
An information processing system that processes received commands and data, the information processing system includes: an internal circuit that processes the received commands and data; a memory that stores the received commands and data as history; and a control circuit that reads the commands and data in the memory and outputs read commands and data to the internal circuit, in response to detection of a failure in the internal circuit.