Abstract:
Techniques for implementing high availability for persistent memory are provided. In one embodiment, a first computer system can detect an alternating current (AC) power loss/cycle event and, in response to the event, can save data in a persistent memory of the first computer system to a memory or storage device that is remote from the first computer system and is accessible by a second computer system. The first computer system can then generate a signal for the second computer system subsequently to initiating or completing the save process, thereby allowing the second computer system to restore the saved data from the memory or storage device into its own persistent memory.
Abstract:
In a virtualized computer system, guest memory pages are mapped to disk blocks that contain identical contents and the mapping is used to improve management processes performed on virtual machines, such as live migration and snapshots. These processes are performed with less data being transferred because the mapping data of those guest memory pages that have identical content stored on disk are transmitted instead of the their contents. As a result, live migration and snapshots can be carried out more quickly. The mapping of the guest memory pages to disk blocks can also be used to optimize other tasks, such as page swaps and memory error corrections.
Abstract:
Exemplary methods, apparatuses, and systems include a distributed memory agent within a first node intercepting an operating system request to open a file from an application running on the first node. The request includes a file identifier, which the distributed memory agent transmits to a remote memory manager. The distributed memory agent receives, from the remote memory manager, a memory location within a second node for the file identifier and information to establish a remote direct memory access channel between the first node and the second node. In response to the request to open the file, the distributed memory agent establishes the remote direct memory access channel between the first node and the second node. The remote direct memory access channel allows the first node to read directly from or write directly to the memory location within the second node while bypassing an operating system of the second node.
Abstract:
Updates to nonvolatile memory pages are mirrored so that certain features of a computer system, such as live migration of applications, fault tolerance, and high availability, will be available even when nonvolatile memory is local to the computer system. Mirroring may be carried out when a cache flush instruction is executed to flush contents of the cache into nonvolatile memory. In addition, mirroring may be carried out asynchronously with respect to execution of the cache flush instruction by retrieving content that is to be mirrored from the nonvolatile memory using memory addresses of the nonvolatile memory corresponding to target memory addresses of the cache flush instruction.
Abstract:
Memory performance in a computer system that implements large page mapping is improved even when memory is scarce by identifying page sharing opportunities within the large pages at the granularity of small pages and breaking up the large pages so that small pages within the large page can be freed up through page sharing. In addition, the number of small page sharing opportunities within the large pages can be used to estimate the total amount of memory that could be reclaimed through page sharing.
Abstract:
Techniques for efficiently swizzling pointers in persistent objects are provided. In one embodiment, a computer system can allocate slabs in a persistent heap, where the persistent heap resides on a byte-addressable persistent memory of the system, and where each slab is a continuous memory segment of the persistent heap that is configured to store instances of an object type used by an application. The system can further store associations between the slabs and their respective object types, and information indicating the locations of pointers in each object type. At the time of a system restart or crash recovery, the system can iterate through each slab and determine, based on the stored associations, the slab's object type. The system can then scan though the allocated objects in the slab and, if the system determines that the object includes any pointers based on the stored pointer location information, can swizzle each pointer.
Abstract:
Examples perform selection of non-uniform memory access (NUMA) nodes for mapping of virtual central processing unit (vCPU) operations to physical processors. A CPU scheduler evaluates the latency between various candidate processors and the memory associated with the vCPU, and the size of the working set of the associated memory, and the vCPU scheduler selects an optimal processor for execution of a vCPU based on the expected memory access latency and the characteristics of the vCPU and the processors. Some examples contemplate monitoring system characteristics and rescheduling the vCPUs when other placements may provide improved performance and/or efficiency.
Abstract:
A system and related method of operation for migrating the memory of a virtual machine from one NUMA node to another. Once the VM is migrated to a new node, migration of memory pages is performed while giving priority to the most utilized pages, so that access to these pages becomes local as soon as possible. Various heuristics are described to enable different implementations for different situations or scenarios.
Abstract:
Techniques for implementing RDMA-based recovery of dirty data in remote memory are provided. In one set of embodiments, upon occurrence of a failure at a first (i.e., source) host system, a second (i.e., failover) host system can allocate a new memory region corresponding to a memory region of the source host system and retrieve a baseline copy of the memory region from a storage backend shared by the source and failover host systems. The failover host system can further populate the new memory region with the baseline copy and retrieve one or more dirty page lists for the memory region from the source host system via RDMA, where the one or more dirty page lists identify memory pages in the memory region that include data updates not present in the baseline copy. For each memory page identified in the one or more dirty page lists, the failover host system can then copy the content of that memory page from the memory region of the source host system to the new memory region via RDMA.
Abstract:
Various aspects are disclosed for unified resource management of containers and virtual machines. A podVM resource configuration for a pod virtual machine (podVM) is determined using container configurations. The podVM comprising a virtual machine (VM) that provides resource isolation for a pod based on the podVM resource configuration. A host selection for the podVM is received from a VM scheduler. The host selection identifies hardware resources for the podVM. A container scheduler is limited to bind the podVM to a node corresponding to the hardware resources of the host selection from the VM scheduler. The podVM is created in a host corresponding to the host selection. Containers are started within the podVM. The containers correspond to the container configurations.