Abstract:
The disclosure herein describes providing and accessing data on an object storage platform using a log-structured merge (LSM) tree file system. The LSM tree file system on the object storage platform includes sorted data tables, each sorted data table including a payload portion and an index portion. Data is written to the LSM tree file system in at least one new sorted data table. Data is ready by identifying a data location of the data based on index portions of the sorted data tables and reading the data from a sorted data table associated with the identified data location. The use of the LSM tree file system on the object storage platform provides an efficient means for interacting with the data stored thereon.
Abstract:
The present disclosure provides techniques for managing a cache of a computer system using a cache management data structure. The cache management data structure includes a cold queue, a ghost queue, and a hot queue. The techniques herein improve the functioning of the computer because management of the cache management data structure can be performed in parallel with multiple cores or multiple processors, because a sequential scan will only pollute (i.e., add unimportant memory pages) cold queue, and to an extent, ghost queue, but not hot queue, and also because the cache management data structure has lower memory requirements and lower CPU overhead on cache hit than some prior art algorithms.
Abstract:
Techniques are disclosed for maintaining high availability (HA) for virtual machines (VMs) running on host systems of a host cluster, where each host system executes a HA module in a plurality of HA modules and a storage module in a plurality of storage modules, where the host cluster aggregates, via the plurality of storage modules, locally-attached storage resources of the host systems to provide an object store, where persistent data for the VMs is stored as per-VM storage objects across the locally-attached storage resources comprising the object store, and where a failure causes the plurality of storage modules to observe a network partition in the host cluster that the plurality of HA modules do not. In one embodiment, a host system in the host cluster executing a first HA module invokes an API exposed by the plurality of storage modules for persisting metadata for a VM to the object store. If the API is not processed successfully, the host system: (1) identifies a subset of second HA modules in the plurality of HA modules; (2) issues an accessibility query for the VM to the subset of second HA modules in parallel, the accessibility query being configured to determine whether the VM is accessible to the respective host systems of the subset of second HA modules; and (3) if at least one second HA module in the subset indicates that the VM is accessible to its respective host system, transmits a command to the at least one second HA module to invoke the API on its respective host system.
Abstract:
The disclosure provides an approach for performing an operation by a first process on behalf of a second process, the method comprising: obtaining, by the first process, a memory handle from the second process, wherein the memory handle allows access, by the first process, to at least some of the address space of the second process; dividing the address space of the memory handle into a plurality of sections; receiving, by the first process, a request from the second process to perform an operation; determining, by the first process, a section of the plurality of sections that is to be mapped from the address space of the memory handle to the address space of the first process for the performance of the operation by the first process; mapping the section from the address space of the memory handle to the address space of the first process; and performing the operation by the first process on behalf of the second process.
Abstract:
Techniques are disclosed for maintaining high availability (HA) for virtual machines (VMs) running on host systems of a host cluster, where each host system executes a HA module in a plurality of HA modules and a storage module in a plurality of storage modules, where the host cluster aggregates, via the plurality of storage modules, locally-attached storage resources of the host systems to provide an object store, where persistent data for the VMs is stored as per-VM storage objects across the locally-attached storage resources comprising the object store, and where a failure causes the plurality of storage modules to observe a network partition in the host cluster that the plurality of HA modules do not. In one embodiment, a host system in the host cluster executing a first HA module invokes an API exposed by the plurality of storage modules for persisting metadata for a VM to the object store. If the API is not processed successfully, the host system: (1) identifies a subset of second HA modules in the plurality of HA modules; (2) issues an accessibility query for the VM to the subset of second HA modules in parallel, the accessibility query being configured to determine whether the VM is accessible to the respective host systems of the subset of second HA modules; and (3) if at least one second HA module in the subset indicates that the VM is accessible to its respective host system, transmits a command to the at least one second HA module to invoke the API on its respective host system.
Abstract:
A method includes obtaining a plurality of representations corresponding respectively to a plurality of blocks of data stored on a source node. A plurality of data pairs are sent to a destination node, where each data pair includes a logical address associated with a block of data from the plurality of blocks of data and the corresponding representation of the block of data. A determination is made whether the blocks of data associated with the respective logical addresses are duplicates of data stored on the destination node. In accordance with an affirmative determination, a reference to a physical address of the block of data stored on the destination node is stored. In accordance with a negative determination, an indication that the data corresponding to the respective logical address is not a duplicate is stored. The data indicated as not being a duplicate is written to the destination node.
Abstract:
A method for restoring a data volume using incremental snapshots of the data volume includes creating a first series of incremental snapshots according to a first predefined interval. The method further includes creating a second series of incremental snapshots according to a second predefined interval that is an integer multiple of the first predefined interval. The method also includes receiving a request to restore the data volume to a point-in-time. The method further includes restoring the data volume to the point-in-time using none or some of the snapshots in the first series that were created at or prior to the point-in-time, and all of the snapshots in the second series that were created at or prior to the point-in-time.
Abstract:
Embodiments of the disclosure provide techniques for updating a distributed transaction log on a previously offline resource object component using distributed transaction logs from active host computer nodes from separate RAID mirror configurations. Each component object maintains a journal (log) where distributed transactions are recorded. If a component object goes offline and subsequently returns (e.g., if the node hosting the component object reboots), the component object is marked as stale. To return the component object to an active state, a distributed resources module retrieves the journals from other resource component objects from other RAID configurations where the data is mirrored. The module filters corresponding data that is missing in the journal of the previously offline corresponding object and merges the filtered data to the journal.
Abstract:
A method and system are disclosed for storing client data objects in a deduplicated storage system. Deduplicated data may be stored in a plurality of physical data blocks. A content map layer can provide a mapping between the physical data blocks to logical map objects associated with the client data objects. The deduplicated data may be mapped to logical data blocks that comprise the client data objects.
Abstract:
Techniques are disclosed for orchestrating high availability (HA) failover for virtual machines (VMs) running on host systems of a host cluster, where the host cluster aggregates locally-attached storage resources of the host systems to provide an object store, and where persistent data for one or more of the VMs is stored as per-VM storage objects across the locally-attached storage resources comprising the object store. In one embodiment, a host system in the host cluster executing a HA module determines a VM to be restarted on an active host system in the host cluster. The host system further determines if the VM's persistent data is stored in the object store. If so, the host system adds the VM to a list of VMs to be immediately restarted. Otherwise, the host system checks whether the VM is accessible to the host system by querying a storage layer of the host system configured to manage the object store.