Abstract:
Efficient application checkpointing uses checkpointing characteristics of a job to determine how to schedule jobs for execution on a multi-node computer system. A checkpoint profile in the job description includes information on the expected frequency and duration of a check point cycle for the application. The checkpoint profile may be based on a user/administrator input as well as historical information. The job scheduler will attempt to group applications (jobs) that have the same checkpoint profile, on the same nodes or group of nodes. Additionally, the job scheduler may control when new jobs start based on when the next checkpoint cycle(s) are expected. The checkpoint monitor will monitor the checkpoint cycles, updating the checkpoint profiles of running jobs. The checkpoint monitor will also keep track of an overall system checkpoint profile to determine the available checkpointing capacity before scheduling jobs on the cluster.
Abstract:
Embodiments are directed to backing up a virtual machine cluster and to determining virtual machine node ownership prior to backing up a virtual machine cluster. In one scenario, a computer system determines which virtual machines nodes are part of the virtual machine cluster, determines which shared storage resources are part of the virtual machine cluster and determines which virtual machine nodes own the shared storage resources. The computer system then indicates to the virtual machine node owners that at least one specified application is to be quiesced over the nodes of the virtual machine cluster, such that a consistent, cluster-wide checkpoint can be created. The computer system further creates a cluster-wide checkpoint which includes a checkpoint for each virtual machine in the virtual machine cluster.
Abstract:
A distributed system according to an exemplary embodiment includes first and second servers capable of executing the same application, wherein when a failure occurs in the application in the first server, the first server generates failure information identifying a cause of the failure in the application, and the second server performs failure prevention processing which is determined based on the failure information and intended to prevent a failure in the application.
Abstract:
A method for enhanced restart of a core dumping application is provided. The method includes stopping a plurality of threads in an address space, except for the thread performing the core dump. Computational segments are remapped to client segments. Each open file descriptor in the address space is closed. The application is terminated and the client segments are flushed to external storage.
Abstract:
A method of recovering batch-based processes may include providing an interface for receiving processes recoverability information. The recoverability information may include (i) information describing a mutual exclusivity of data affected by a process, (ii) information describing sub-processes associated with the process, and/or (iii) information describing scope cleanup procedures associated with the process. The method may also include receiving the recoverability information through the interface, and receiving an indication that the process experienced an error while being executed on a client system. The method may additionally include providing the process recoverability information to make a recoverability determination for the process.
Abstract:
Design and operation of a processing device is configurable to optimize wake-up time and peak power cost during restoration of a machine state from non-volatile storage. The processing device includes a plurality of non-volatile logic element arrays configured to store a machine state represented by a plurality of volatile storage elements of the processing device. A stored machine state is read out from the plurality of non-volatile logic element arrays to the plurality of volatile storage elements. During manufacturing, a number of rows and a number of bits per row in non-volatile logic element arrays are based on a target wake up time and a peak power cost. In another approach, writing data to or reading data of the plurality of non-volatile arrays can be done in parallel, sequentially, or in any combination to optimize operation characteristics.
Abstract:
A computer system comprises a processor unit arranged to run a hypervisor running one or more virtual machines; a cache connected to the processor unit and comprising a plurality of cache rows, each cache row comprising a memory address, a cache line and an image modification flag; and a memory connected to the cache and arranged to store an image of at least one virtual machine. The processor unit is arranged to define a log in the memory and the cache further comprises a cache controller arranged to set the image modification flag for a cache line modified by a virtual machine being backed up, but not for a cache line modified by the hypervisor operating in privilege mode; periodically check the image modification flags; and write only the memory address of the flagged cache rows in the defined log.
Abstract:
Embodiments described herein provide recovery placeholders within an application. Specifically, one approach includes providing an application operating on a client device, and generating a recovery placeholder that defines a current state of the application by analyzing a queue containing a set of messages, and identifying one or more selected events corresponding to the application from the queue. In one approach, the current state defines, at the time the recovery placeholder is generated, at least one of: a position within a window of the application, a current activity of the application, a position of the window within a display of a display device, and a placement order of the window of the application in relation to a stack of other cascaded windows. At a later point in time, the application may then be restored to the current state by accessing the recovery placeholder to replay the one or more selected events.
Abstract:
A computer system comprises a processor unit arranged to run a hypervisor running one or more virtual machines; a cache connected to the processor unit and comprising a plurality of cache rows, each cache row comprising a memory address, a cache line and an image modification flag; and a memory connected to the cache and arranged to store an image of at least one virtual machine. The processor unit is arranged to define a log in the memory and the cache further comprises a cache controller arranged to set the image modification flag for a cache line modified by a virtual machine being backed up, but not for a cache line modified by the hypervisor operating in privilege mode; periodically check the image modification flags; and write only the memory address of the flagged cache rows in the defined log.
Abstract:
An electronic device is provided. The electronic device includes a processor configured to record exercise history information relating to an exercise performed by a user of the electronic device, and a memory configured to store the exercise history information, wherein the processor is configured to record the exercise history information continuously if an exercise application of the electronic device is unintentionally terminated, and retrieve the stored exercise history information when the exercise application is restarted.