Abstract:
A method of acquiring a lock by a node, on a shared resource in a system of a plurality of interconnected nodes, is disclosed. Each node that competes for a lock on the shared resource maintains a list of locks currently owned by the node. A lock metadata is maintained on a shared storage that is accessible to all nodes that may compete for locks on shared resources. A heartbeat region is maintained on a shared resource corresponding to each node so nodes can register their liveness. A lock state is maintained in the lock metadata in the shared storage. A lock state may indicate lock held exclusively, lock free or lock in managed mode. If the lock is held in the managed mode, the ownership of the lock can be transferred to another node without a use of a mutual exclusion primitive such as the SCSI reservation.
Abstract:
A method for performing I/O operations on a file stored in a file system utilizing a shared data storage system and accessible by a plurality of host computers is disclosed. A host computer receives from a process executing on it, a request to read data stored in the file. The host computer then requests the data stored in the file without acquiring a lock from the file system. The host computer also maintains a timeout value associated with the file while reading the data. The host computer receives at least a portion of the data prior to an expiration of time, and if all the data has not been received before the expiration of time, it then assesses whether another of the host computers has acquired a lock on the file, and, if so, invalidates the received data without providing it to the requesting process.
Abstract:
Multiple servers sharing a distributed file system are used to perform copies of regions of a source file in parallel from a source storage unit to corresponding temporary files at a destination storage unit. These temporary files are then merged or combined into a single file at the destination storage unit in a way that preserves the inode structure and attributes of the source file. A substantial speedup is obtained by copying regions of the file in parallel.
Abstract:
Input/output operations (IOs) are issued to a storage system using request queues that are each maintained for a resource targeted by the IOs. When an IO is requested, the target resource for the IO is first identified. If a request queue is maintained for the target resource, the IO is added to the request queue and the IO is issued to the storage system as the target resource becomes available. The availability of the target resource may be determined through periodic checks or by monitoring completions of IOs issued out of the request queue.
Abstract:
A method for performing I/O operations on a file stored in a file system utilizing a shared data storage system and accessible by a plurality of host computers is disclosed. A host computer receives from a process executing on it, a request to read data stored in the file. The host computer then requests the data stored in the file without acquiring a lock from the file system. The host computer also maintains a timeout value associated with the file while reading the data. The host computer receives at least a portion of the data prior to an expiration of time, and if all the data has not been received before the expiration of time, it then assesses whether another of the host computers has acquired a lock on the file, and, if so, invalidates the received data without providing it to the requesting process.
Abstract:
Free storage blocks previously allocated to a logical block device are released back to an underlying storage system supporting the logical block device in a manner that does not conflict with write operations that may be issued to the free storage blocks at about the same time. According to a first technique, write operations on the same storage blocks to be released are paused until the underlying storage system has completed the releasing operation or, if the write operations are issued earlier than when the underlying storage system actually performs the releasing operation, such storage blocks are not released. According to a second technique, a special file is allocated the free storage blocks, which are then made available for safe releasing.
Abstract:
A virtualized computer system employs a virtual disk. Multiple snapshots of the virtual disk can be created. After a snapshot is created, writes to the virtual disk are captured in delta disks. Two snapshots are consolidated by updating block references in snapshot meta data. Block reference update takes advantage of the fact that blocks for the two snapshot are managed within the same storage container and, therefore, can be moved in the snapshot logical space without incurring data copy operations. Consolidation of delta disks also gracefully handles failures during the consolidation operation and can be restarted anew after the system has recovered from failure.
Abstract:
A virtualized computer system employs a virtual disk with a space efficient (SE) format to store data for virtual machines running therein. Data within a virtual disk with a SE format is stored in a grain, where multiple grains are included in a storage block. Writes to a grain within shared storage block in a virtual disk with an SE format are serviced by allocating a new grain and storing the write data to the new grain. Metadata associated with the client that transmitted the write request to the virtual disk is then updated to point to the new grain instead of the grain within the shared storage block.
Abstract:
A virtualized storage stack includes logical layers above the physical storage layer. Each logical layer allocates data blocks, and the data block allocation is propagated down to the physical storage layer. To facilitate contiguous storage, each layer of the virtualized storage stack maintains additional metadata associated with data blocks. For each data block, the metadata indicates whether the data block is free, provisioned and includes a tag that indicates when the data block was first written. Data blocks that were first written as part of the same write request share the same tag, and are mostly guaranteed to be physically co-located. Block allocations that reuse data blocks having the same tag are preferred. Such preference increases the likelihood of the blocks being contiguous in the physical storage as these blocks were allocated as part of the same first write.