摘要:
Techniques for implementing high availability for persistent memory are provided. In one embodiment, a first computer system can detect an alternating current (AC) power loss/cycle event and, in response to the event, can save data in a persistent memory of the first computer system to a memory or storage device that is remote from the first computer system and is accessible by a second computer system. The first computer system can then generate a signal for the second computer system subsequently to initiating or completing the save process, thereby allowing the second computer system to restore the saved data from the memory or storage device into its own persistent memory.
摘要:
Techniques for achieving application high availability via application-transparent battery-backed replication of persistent data are provided. In one set of embodiments, a computer system can detect a failure that causes an application of the computer system to stop running. In response to detecting the failure, the computer system can copy persistent data written by the application and maintained locally at the computer system to one or more remote destinations, where the copying is performed in a manner that is transparent to the application and while the computer system runs on battery power. The application can then be restarted on another computer system using the copied data.
摘要:
A journal-based process to achieve atomicity in a device driver write operation includes committing a transaction associated with the operation to a journal that include a status indicating the target block is corrupted. Subsequent to committing the transaction, the data is written to the target block. If the write operation is successfully committed, the transaction can be deleted from the journal. If a system crash occurs (e.g., power failure) before the write operation is successfully committed, the transaction remains in the journal and can be used to update block metadata associated with the target block when the system reboots to indicate that it is corrupted; e.g., the target block is a torn write.
摘要:
Techniques for efficiently purging non-active blocks in an NVM region of an NVM device while preserving large pages are provided. In one set of embodiments, a host system can receive a write request with respect to a data block of the NVM region, where the data block is referred to by a snapshot of the NVM region and was originally allocated as part of a large page. The host system can further allocate a new data block in the NVM region, copy contents of the data block to the new data block, and update the data block with write data associated with the write request. The host system can then update a level 1 (L1) page table entry of the NVM region's running point to point to the original data block.
摘要:
Techniques for efficiently purging non-active blocks in an NVM region of an NVM device while preserving large pages are provided. In one set of embodiments, a host system can receive a write request with respect to a data block of the NVM region, where the data block is referred to by a snapshot of the NVM region and was originally allocated as part of a large page. The host system can further allocate a new data block in the NVM region, copy contents of the data block to the new data block, and update the data block with write data associated with the write request. The host system can then update a level 1 (L1) page table entry of the NVM region's running point to point to the original data block.
摘要:
Examples provide a page-fault latency feedback metric to determine performance of workloads or virtual machines (VMs) running on a VM host in a cluster. A hypervisor induces page-faults by varying a memory limit associated with a VM. Page-fault latencies are measured at each of the varying memory limits. A performance loss occurring at each page-fault latency is measured and converted to a performance score. A page-fault translation table is constructed based on the page-fault latencies and assigned performance scores. When a page-fault occurs during execution of a workload on a VM host in the cluster, a cluster manager maps the page-fault latency associated with the page-fault to a performance score in the page-fault translation table. The cluster manager computes a current workload performance or VM performance based on the page-fault latency and the performance score.
摘要:
A system and method are disclosed for improving operation of a memory scheduler operating on a host machine supporting virtual machines (VMs) in which guest operating systems and guest applications run. For each virtual machine, the host machine hypervisor categorizes memory pages into memory usage classes and estimates the total number of pages for each memory usage class. The memory scheduler uses this information to perform memory reclamation and allocation operations for each virtual machine. The memory scheduler further selects between ballooning reclamation and swapping reclamation operations based in part on the numbers of pages in each memory usage class for the virtual machine. Calls to the guest operating system provide the memory usage class information. Memory reclamation not only can improve the performance of existing VMs, but can also permit the addition of a VM on the host machine without substantially impacting the performance of the existing and new VMs.
摘要:
A technique for efficient swap space management creates a swap reservation file using thick provisioning to accommodate a maximum amount of memory reclamation from a set of one or more associated virtual machines (VMs). A VM swap file is created for each VM using thin provisioning. When a new block is needed to accommodate page swaps to a given VM swap file, a block is removed from the swap reservation file and a block is added to the VM swap file, thereby maintaining a net zero difference in overall swap storage. The removed block and the added block may be the same storage block if a block move operation is supported by a file system implementing the swap reservation file and VM swap files. The technique also accommodates swap space management of resource pools.
摘要:
Memory performance in a computer system that implements large page mapping is improved even when memory is scarce by identifying page sharing opportunities within the large pages at the granularity of small pages and breaking up the large pages so that small pages within the large page can be freed up through page sharing. In addition, the number of small page sharing opportunities within the large pages can be used to estimate the total amount of memory that could be reclaimed through page sharing.
摘要:
Techniques for implementing RDMA-based recovery of dirty data in remote memory are provided. In one set of embodiments, upon occurrence of a failure at a first (i.e., source) host system, a second (i.e., failover) host system can allocate a new memory region corresponding to a memory region of the source host system and retrieve a baseline copy of the memory region from a storage backend shared by the source and failover host systems. The failover host system can further populate the new memory region with the baseline copy and retrieve one or more dirty page lists for the memory region from the source host system via RDMA, where the one or more dirty page lists identify memory pages in the memory region that include data updates not present in the baseline copy. For each memory page identified in the one or more dirty page lists, the failover host system can then copy the content of that memory page from the memory region of the source host system to the new memory region via RDMA.