摘要:
The disclosed embodiments disclose techniques for transferring and caching a cloud file in a cloud controller. Two or more cloud controllers collectively manage distributed filesystem data that is stored in one or more cloud storage systems; the cloud controllers cache and ensure data consistency for the stored data. During operation, a cloud controller receives a client request for a data block of a target file that is stored in the distributed filesystem but not currently cached in the cloud controller. The cloud controller initiates a request to a cloud storage system for a cloud file containing the requested data block. While receiving the cloud file from the cloud storage system, the cloud controller uses a set of block metadata in the portion of the cloud file that has already been received to determine the portions of the cloud file that should be downloaded to and cached in the cloud controller.
摘要:
The disclosed embodiments disclose techniques that facilitate the recovery of a virtual machine using a distributed filesystem. Two or more cloud controllers collectively manage distributed filesystem data that is stored in one or more cloud storage systems; the cloud controllers ensure data consistency for the stored data, and each cloud controller caches portions of the distributed filesystem in a local storage pool. During operation, a host server executes program instructions for an application in a virtual machine (VM); data associated with this application and/or this virtual machine is stored in the distributed filesystem. Upon detecting a subsequent failure, the system can recover and resume the execution of the virtual machine and application using the previous application and virtual machine data that was stored in the distributed filesystem.
摘要:
The disclosed embodiments disclose techniques that facilitate the process of performing anti-virus checks for a distributed filesystem. Two or more cloud controllers collectively manage distributed filesystem data that is stored in one or more cloud storage systems; the cloud controllers ensure data consistency for the stored data, and each cloud controller caches portions of the distributed filesystem. During operation, a cloud controller receives a write request from a client system that seeks to store a target file in the distributed system. A scan is then performed for this target file. For instance, the scan may be an anti-virus scan that ensures that viruses are not spread to the distributed filesystem or the clients of the distributed filesystem.
摘要:
The disclosed embodiments disclose techniques for executing a cloud command for a distributed filesystem. Two or more cloud controllers collectively manage distributed filesystem data that is stored in one or more cloud storage systems; the cloud controllers ensure data consistency for the stored data, and each cloud controller caches portions of the distributed filesystem. During operation, a cloud controller presents a distributed-filesystem-specific capability to a client system as a file in the distributed filesystem (e.g., using a file abstraction). Upon receiving a request from the client system to access and/or operate upon this file, the client controller executes an associated cloud command. More specifically, the cloud controller initiates a specially-defined operation that accesses additional functionality for the distributed filesystem that exceeds the scope of individual reads and writes to a typical data file.
摘要:
The disclosed embodiments provide a system that archives data for a distributed filesystem. Two or more cloud controllers collectively manage distributed filesystem data that is stored in one or more cloud storage systems; the cloud controllers cache and ensure data consistency for the stored data. During operation, a cloud controller receives a request from a client for a data block of a file stored in the distributed filesystem. Upon determining that the requested data block is not currently cached in the cloud controller, the cloud controller sends a peer cache request for the requested data block to a peer cloud controller in the distributed filesystem.
摘要:
The disclosed embodiments provide a system that distributes data for a distributed filesystem across multiple cloud storage systems. Two or more cloud controllers collectively manage distributed filesystem data that is stored in one or more cloud storage systems; the cloud controllers cache and ensure data consistency for the stored data. Whenever each cloud controller receives new data from a client, it outputs an incremental metadata snapshot for the new data that is propagated to the other cloud controllers and an incremental data snapshot containing the new data that is sent to a cloud storage system. During operation, a backup cloud controller associated with the distributed filesystem is also configured to receive each (incremental) metadata snapshot, such that, upon determining the failure of a cloud controller, the backup cloud controller can immediately begin receiving data requests from clients associated with the failed cloud controller.
摘要:
The disclosed embodiments provide a system that archives data for a distributed filesystem. Two or more cloud controllers collectively manage distributed filesystem data that is stored in one or more cloud storage systems; the cloud controllers cache and ensure data consistency for the stored data. During operation, a cloud controller determines that a cloud file in a previously stored data snapshot is no longer being actively referenced in the distributed filesystem. The cloud controller transfers this cloud file from the (first) cloud storage system to an archival cloud storage system, thereby reducing storage costs while preserving the data in the cloud file in case it is ever needed again.
摘要:
The disclosed embodiments disclose techniques that facilitate of avoiding client timeouts in a distributed filesystem. Multiple cloud controllers collectively manage distributed filesystem data that is stored in one or more cloud storage systems; the cloud controllers ensure data consistency for the stored data, and each cloud controller caches portions of the distributed filesystem in a local storage pool. During operation, a cloud controller receives from a client system a request for a data block in a target file that is stored in the distributed filesystem. Although the cloud controller is already caching the requested data block, the cloud controller delays transmission of the cached data block; this additional delay gives the cloud controller more time to access uncached data blocks for the target file from a cloud storage system, thereby ensuring that subsequent requests of such data blocks do not exceed a timeout interval on the client system.
摘要:
The disclosed embodiments disclose techniques for managing metadata and data storage for a cloud controller in a distributed filesystem. Two or more cloud controllers collectively manage distributed filesystem data that is stored in one or more cloud storage systems. More specifically, the cloud controllers cache and ensure data consistency for the data stored in the cloud storage systems, with each cloud controller maintaining (e.g., storing) in a local storage device: (1) one or more metadata regions containing a metadata hierarchy that reflects the current state of the distributed filesystem; and (2) cached data for the distributed filesystem. During operation, the cloud controller receives an incremental metadata snapshot that references new data written to the distributed filesystem. The cloud controller stores updated metadata from this incremental metadata snapshot in one of the metadata regions on the local storage device.
摘要:
The disclosed embodiments disclose techniques that facilitate of avoiding client timeouts in a distributed filesystem. Multiple cloud controllers collectively manage distributed filesystem data that is stored in one or more cloud storage systems; the cloud controllers ensure data consistency for the stored data, and each cloud controller caches portions of the distributed filesystem in a local storage pool. During operation, a cloud controller receives from a client system a request for a data block in a target file that is stored in the distributed filesystem. Although the cloud controller is already caching the requested data block, the cloud controller delays transmission of the cached data block; this additional delay gives the cloud controller more time to access uncached data blocks for the target file from a cloud storage system, thereby ensuring that subsequent requests of such data blocks do not exceed a timeout interval on the client system.