Abstract:
A virtualized computer system employs a virtual disk. Multiple snapshots of the virtual disk can be created. After a snapshot is created, writes to the virtual disk are captured in delta disks. Two snapshots are consolidated by updating block references in snapshot meta data. Block reference update takes advantage of the fact that blocks for the two snapshot are managed within the same storage container and, therefore, can be moved in the snapshot logical space without incurring data copy operations. Consolidation of delta disks also gracefully handles failures during the consolidation operation and can be restarted anew after the system has recovered from failure.
Abstract:
A virtualized storage stack includes logical layers above the physical storage layer. Each logical layer allocates data blocks, and the data block allocation is propagated down to the physical storage layer. To facilitate contiguous storage, each layer of the virtualized storage stack maintains additional metadata associated with data blocks. For each data block, the metadata indicates whether the data block is free, provisioned and includes a tag that indicates when the data block was first written. Data blocks that were first written as part of the same write request share the same tag, and are mostly guaranteed to be physically co-located. Block allocations that reuse data blocks having the same tag are preferred. Such preference increases the likelihood of the blocks being contiguous in the physical storage as these blocks were allocated as part of the same first write.
Abstract:
Multiple servers sharing a distributed file system are used to perform copies of regions of a source file in parallel from a source storage unit to corresponding temporary files at a destination storage unit. These temporary files are then merged or combined into a single file at the destination storage unit in a way that preserves the inode structure and attributes of the source file. A substantial speedup is obtained by copying regions of the file in parallel.
Abstract:
A method of acquiring a lock by a node, on a shared resource in a system of a plurality of interconnected nodes, is disclosed. Each node that competes for a lock on the shared resource maintains a list of locks currently owned by the node. A lock metadata is maintained on a shared storage that is accessible to all nodes that may compete for locks on shared resources. A heartbeat region is maintained on a shared resource corresponding to each node so nodes can register their liveness. A lock state is maintained in the lock metadata in the shared storage. A lock state may indicate lock held exclusively, lock free or lock in managed mode. If the lock is held in the managed mode, the ownership of the lock can be transferred to another node without a use of a mutual exclusion primitive such as the SCSI reservation.
Abstract:
A network-based method for managing locks in a shared file system (SFS) for a group of hosts that does not require any configuration to identify a server for managing locks for the SFS. Each host in the group carries out the steps of checking a predetermined storage location to determine whether there is a host ID written in the predetermined location. If there is no host ID written in the predetermined location, the first host to notice this condition writes its host ID in the predetermined location to identify itself as the server for managing locks. If there is a host ID written in the predetermined location, the host ID of the server for managing locks is maintained in local memory. When the host needs to perform IO operations on a file of the SFS, it communicates with the server for managing locks over the network using the host ID of the server for managing locks to obtain a lock to the file.
Abstract:
A virtualized computer system employs a virtual disk with a space efficient (SE) format to store data for virtual machines running therein. Data within a virtual disk with a SE format is stored in a grain, where multiple grains are included in a storage block. Writes to a grain within shared storage block in a virtual disk with an SE format are serviced by allocating a new grain and storing the write data to the new grain. Metadata associated with the client that transmitted the write request to the virtual disk is then updated to point to the new grain instead of the grain within the shared storage block.
Abstract:
Decentralized deduplication operations in a computer system employ a hash index that is a variant of a B+ tree to support both efficient sequential updates as well as efficient random updates. Sequential update is selected when deduplication is infrequently performed, such as on the order of days, and random update is selected when deduplication is performed more frequently, such as on the order of seconds. More frequent deduplication may be beneficial during periods when large amounts of temporary duplicate data are created, and the system may not have enough storage space to accommodate the temporary spike in demand.
Abstract:
Free storage blocks previously allocated to a logical block device are released back to an underlying storage system supporting the logical block device in a manner that does not conflict with write operations that may be issued to the free storage blocks at about the same time. According to a first technique, write operations on the same storage blocks to be released are paused until the underlying storage system has completed the releasing operation or, if the write operations are issued earlier than when the underlying storage system actually performs the releasing operation, such storage blocks are not released. According to a second technique, a special file is allocated the free storage blocks, which are then made available for safe releasing.
Abstract:
Input/output operations (IOs) are issued to a storage system using request queues that are each maintained for a resource targeted by the IOs. When an IO is requested, the target resource for the IO is first identified. If a request queue is maintained for the target resource, the IO is added to the request queue and the IO is issued to the storage system as the target resource becomes available. The availability of the target resource may be determined through periodic checks or by monitoring completions of IOs issued out of the request queue.