摘要:
A method of acquiring a lock by a node, on a shared resource in a system of a plurality of interconnected nodes, is disclosed. Each node that competes for a lock on the shared resource maintains a list of locks currently owned by the node. A lock metadata is maintained on a shared storage that is accessible to all nodes that may compete for locks on shared resources. A heartbeat region is maintained on a shared resource corresponding to each node so nodes can register their liveness. A lock state is maintained in the lock metadata in the shared storage. A lock state may indicate lock held exclusively, lock free or lock in managed mode. If the lock is held in the managed mode, the ownership of the lock can be transferred to another node without a use of a mutual exclusion primitive such as the SCSI reservation.
摘要:
A network-based method for managing locks in a shared file system (SFS) for a group of hosts that does not require any configuration to identify a server for managing locks for the SFS. Each host in the group carries out the steps of checking a predetermined storage location to determine whether there is a host ID written in the predetermined location. If there is no host ID written in the predetermined location, the first host to notice this condition writes its host ID in the predetermined location to identify itself as the server for managing locks. If there is a host ID written in the predetermined location, the host ID of the server for managing locks is maintained in local memory. When the host needs to perform IO operations on a file of the SFS, it communicates with the server for managing locks over the network using the host ID of the server for managing locks to obtain a lock to the file.
摘要:
Multiple servers sharing a distributed file system are used to perform copies of regions of a source file in parallel from a source storage unit to corresponding temporary files at a destination storage unit. These temporary files are then merged or combined into a single file at the destination storage unit in a way that preserves the inode structure and attributes of the source file. A substantial speedup is obtained by copying regions of the file in parallel.
摘要:
A shared file system for a group of host computer systems is upgraded in-place in a manner that the shared file system can remain online and accessible to the host computer systems. Each host computer system first loads a new file system driver that is backward compatible with a driver that is currently used by them to interact with the file system. Second, one of the host computer systems acquires locks to file system management data structures of the file system, upgrades the file system management data structures, and upon completion thereof, notifies the other host computer system that the upgrade to the file system management data structures is complete.
摘要:
Metadata of a shared file in a clustered file system is changed in a way that ensures cache coherence amongst servers that can simultaneously access the shared file. Before a server changes the metadata of the shared file, it waits until no other server is attempting to access the shared file, and all I/O operations to the shared file are blocked. After writing the metadata changes to the shared file, local caches of the other servers are updated, as needed, and I/O operations to the shared file are unblocked.
摘要:
A data center comprising plural computer hosts and a storage system external to said hosts is disclosed. The storage system includes storage blocks for storing tangibly encoded data blocks. Each of said hosts includes a deduplicating file system for identifying and merging identical data blocks stored in respective storage blocks into one of said storage blocks so that a first file exclusively accessed by a first host of said hosts and a second file accessed exclusively by a second host of said hosts concurrently refer to the same one of said storage blocks.
摘要:
Free storage blocks previously allocated to a logical block device are released back to an underlying storage system supporting the logical block device in a manner that does not conflict with write operations that may be issued to the free storage blocks at about the same time. According to a first technique, write operations on the same storage blocks to be released are paused until the underlying storage system has completed the releasing operation or, if the write operations are issued earlier than when the underlying storage system actually performs the releasing operation, such storage blocks are not released. According to a second technique, a special file is allocated the free storage blocks, which are then made available for safe releasing.
摘要:
A virtualized computer system employs a virtual disk with a space efficient (SE) format to store data for virtual machines running therein. The SE format allows for defragmentation at a fine-grained level, where unused, stale, and zero blocks are moved to the end of the virtual disk so that the virtual disk may be truncated and space reclaimed by the underlying storage system as part of a special defragmentation process.
摘要:
A virtualized storage stack includes logical layers above the physical storage layer. Each logical layer allocates data blocks, and the data block allocation is propagated down to the physical storage layer. To facilitate contiguous storage, each layer of the virtualized storage stack maintains additional metadata associated with data blocks. For each data block, the metadata indicates whether the data block is free, provisioned and includes a tag that indicates when the data block was first written. Data blocks that were first written as part of the same write request share the same tag, and are mostly guaranteed to be physically co-located. Block allocations that reuse data blocks having the same tag are preferred. Such preference increases the likelihood of the blocks being contiguous in the physical storage as these blocks were allocated as part of the same first write.
摘要:
A method for performing I/O operations on a file stored in a file system utilizing a shared data storage system and accessible by a plurality of host computers is disclosed. A host computer receives from a process executing on it, a request to read data stored in the file. The host computer then requests the data stored in the file without acquiring a lock from the file system. The host computer also maintains a timeout value associated with the file while reading the data. The host computer receives at least a portion of the data prior to an expiration of time, and if all the data has not been received before the expiration of time, it then assesses whether another of the host computers has acquired a lock on the file, and, if so, invalidates the received data without providing it to the requesting process.