Abstract:
Techniques for achieving application high availability via crash-consistent asynchronous replication of persistent data are provided. In one set of embodiments, an application running on a computer system can, during runtime of the application: write persistent data to a local nonvolatile data store of the computer system, write one or more log entries comprising the persistent data to a local log region of the computer system, and asynchronously copy the one or more log entries to one or more remote destinations. Then, upon detecting a failure that prevents the application from continuing execution, the computer system can copy the local log region or a remaining portion thereof to the one or more remote destinations, where the copying is performed while the computer system runs on battery power and where the application is restarted on another computer system using a persistent state derived from the copied log entries.
Abstract:
Techniques for implementing application fault tolerance via battery-backed replication of volatile state are provided. In one set of embodiments, a primary host system can detect a failure that causes an application of the primary host system to stop running. In response to detecting the failure, the primary host system can replicate volatile state that is used by the application to a secondary host system, where the secondary host system maintains a copy of the application, and where execution of the application is failed over to the copy on the secondary host system using the replicated volatile state.
Abstract:
Techniques for efficiently purging non-active blocks in an NVM region of an NVM device using pointer elimination are provided. In one set of embodiments, a host system can, for each level 1 (L1) page table entry of each snapshot of the NVM region, determine whether a data block of the NVM region that is pointed to by the L1 page table entry is a non-active block, and if the data block is a non-active block, remove a pointer to the data block in the L1 page table entry and reduce a reference count parameter associated with the data block by 1. If the reference count parameter has reached zero at this point, the host system purge the data block from the NVM device to the mass storage device.
Abstract:
Hierarchical resource tree memory operations can include receiving, at a memory scheduler, an indication of a proposed modification to a value of a memory parameter of an object represented by a node of a hierarchical resource tree, wherein the proposed modification is made by a modifying entity, locking the node of the hierarchical resource tree by the memory scheduler, performing the proposed modification by the memory scheduler, wherein performing the proposed modification includes creating a working value of the memory parameter according to the proposed modification, determining whether the proposed modification violates a structural consistency of the hierarchical resource tree based on the working value, and replacing the value of the memory parameter with the working value of the memory parameter in response to determining that the proposed modification does not violate a structural consistency of the hierarchical resource tree based on the working value, and unlocking the node of the hierarchical resource tree by the memory scheduler.
Abstract:
Techniques for implementing high availability for persistent memory are provided. In one embodiment, a first computer system can detect an alternating current (AC) power loss/cycle event and, in response to the event, can save data in a persistent memory of the first computer system to a memory or storage device that is remote from the first computer system and is accessible by a second computer system. The first computer system can then generate a signal for the second computer system subsequently to initiating or completing the save process, thereby allowing the second computer system to restore the saved data from the memory or storage device into its own persistent memory.
Abstract:
Techniques for implementing OS/hypervisor-based persistent memory are provided. In one embodiment, an OS or hypervisor running on a computer system can allocate a portion of the volatile memory of the computer system as a persistent memory allocation. The OS/hypervisor can further receive a signal from the computer system's BIOS indicating an AC power loss or cycle event and, in response to the signal, can save data in the persistent memory allocation to a nonvolatile backing store. Then, upon restoration of AC power to the computer system, the OS/hypervisor can restore the saved data from the nonvolatile backing store to the persistent memory allocation.
Abstract:
Memory performance in a computer system that implements large page mapping is improved even when memory is scarce by identifying page sharing opportunities within the large pages at the granularity of small pages and breaking up the large pages so that small pages within the large page can be freed up through page sharing. In addition, the number of small page sharing opportunities within the large pages can be used to estimate the total amount of memory that could be reclaimed through page sharing.
Abstract:
The prioritization of large memory page mapping is a function of the access bits in the L1 page table. In a first phase of operation, the number of set access bits in each of the L1 page tables is counted periodically and a current count value is calculated therefrom. During the first phase, no pages are mapped large even if identified as such. After the first phase, the current count value is used to prioritize among potential large memory pages to determine which pages to map large. The system continues to calculate the current count value even after the first phase ends. When using hardware assist, the access bits in the nested page tables are used and when using software MMU, the access bits in the shadow page tables are used for large page prioritization.
Abstract:
A system and method are disclosed for improving operation of a memory scheduler operating on a host machine supporting virtual machines (VMs) in which guest operating systems and guest applications run. For each virtual machine, the host machine hypervisor categorizes memory pages into memory usage classes and estimates the total number of pages for each memory usage class. The memory scheduler uses this information to perform memory reclamation and allocation operations for each virtual machine. The memory scheduler further selects between ballooning reclamation and swapping reclamation operations based in part on the numbers of pages in each memory usage class for the virtual machine. Calls to the guest operating system provide the memory usage class information. Memory reclamation not only can improve the performance of existing VMs, but can also permit the addition of a VM on the host machine without substantially impacting the performance of the existing and new VMs.
Abstract:
Techniques for implementing RDMA-based recovery of dirty data in remote memory are provided. In one set of embodiments, upon occurrence of a failure at a first (i.e., source) host system, a second (i.e., failover) host system can allocate a new memory region corresponding to a memory region of the source host system and retrieve a baseline copy of the memory region from a storage backend shared by the source and failover host systems. The failover host system can further populate the new memory region with the baseline copy and retrieve one or more dirty page lists for the memory region from the source host system via RDMA, where the one or more dirty page lists identify memory pages in the memory region that include data updates not present in the baseline copy. For each memory page identified in the one or more dirty page lists, the failover host system can then copy the content of that memory page from the memory region of the source host system to the new memory region via RDMA.