摘要:
The invention relates generally to mass storage systems, and in particular to mass storage systems in which stored logical volumes are duplicated in mirrored form. The system includes a method for dynamically adjusting the mirror service policy for a disk drive system by periodically collecting statistics describing the reading and writing of data to mirrored logical volumes of the system in successive time periods and determining, from time to time, from the collected statistics, whether the mirror service policy should continue or should change. In particular, the system takes into account activity levels at the physical devices and results in more efficient accessing of logical volume pairs as well as a better balance of loading and accessing the logical volumes.
摘要:
A versioned file system comprises a set of structured data representations, such as XML. Each structured data representation corresponds to a “version,” and each version comprises a tree of write-once objects rooted at a root directory manifest. Each version in the versioned file system has associated therewith a “borrow window.” When it is desired to reconstruct the file system to a point in time (or, more generally, a given state), i.e., to perform a “restore,” it is only required to walk (use) a single structured data representation (a tree). During a restore, metadata is pulled back from the cloud first, so users can see the existence of needed files immediately. The remainder of the data is then pulled back from the cloud if/when the user goes to open the file. As a result, the entire file system (or any portion thereof) can be restored to a previous time nearly instantaneously. A “fast” restore is performed if an object being restored exists within a “borrow window” of the version from which the system is restoring. A version is pruned from the versioned file system by deleting all objects in the tree (associated with the version) that, at the time of pruning: (i) are not being lent to any other version within the borrow window of the version being pruned, and (ii) are not referenced in any other version whose borrow window is sufficiently large enough such that an object in the version could have been restored from that other version.
摘要:
A memory storage device has a file storage operating system that uses inodes to access file segments. Each inode has a plurality of rows. A portion of the rows can store extents pointing, directly or indirectly, to data blocks. Each extent has a field to indicate whether the extent is an indirect extent or a direct extent.
摘要:
Described are techniques for performing multi-sequential I/O operations in connection with data requests involving a data storage device. An single data request may involve more than a one portion of data associated with a single job record, such as a single request may involve more than a single track of data of a logical device. A single job record corresponds to a single track. A data structure arrangement is disclosed that includes multiple job records corresponding to the single data request involving more than a single track of data. The multiple job records for a single data request are connected together in a data structure arrangement that may be used in connection with a single read operation involving more than a single track of data. This data structure may also be used in connection with storing a plurality of pending write requests, such as in connection with writing data from cache locations to a plurality of tracks of a particular device in which the plurality of pending write requests are represented as a single data request.
摘要:
Described are techniques for use in determining a dynamic mirror service policy (DMSP) for a plurality of mirror devices. The DMSP determines which of the plurality of mirror devices services I/O operations associated with a logical volume (LV), such as a read operation, at a particular point in time. The particular DMSP may subsequently be recalculated using device statistics from a different time interval. Part of determining a DMSP includes using device statistics to determine the activity level of each LV. The activity levels of multiple LVs may be combined to determine the activity level associated with a particular mirror device. A mirror device is selected if it has the minimum activity of all the plurality of mirror devices. Seek minimization processing is performed to minimize the distance between LVs stored on a single mirror device. Parameters used in connection with determining a DMSP may be stored in a configuration file and may be dynamically modified. Techniques described may also be used in an embodiment having a static MSP.
摘要:
A mechanism for optimizing predictive read performance in a data storage system that is connected to a geographically remote data storage system by a data link for remote replication of data in support of data recovery operations. The data storage system initiates a local prefetch and initiates via the data link a remote prefetch by the remote data storage system to retrieve data from storage devices coupled to the local and remote data storage systems, respectively. The remote prefetch read start address is offset from the local prefetch read start address by a programmable track offset value. The programmable track offset value is adjusted to tune the prefetch workload balance between the local and remote data storage systems.
摘要:
A disk drive array controller generally has a host I/O port configured for connection to a host computer and a plurality of disk I/O ports configured for connection to a corresponding plurality of disks forming a disk drive array. A controller constructed in accordance with various aspects of the present invention may include a host I/O processor in communication with the host I/O port and configured to perform I/O transactions with the host computer through the host I/O port; a cache memory; a front end caching subsystem in communication with the host I/O processor and configured to cache blocks of data comprising host I/O transactions in the cache memory; a disk array I/O processor configured to access host data in the cache memory and in communication with the plurality of disk drives, the disk array I/O processor processing host I/O transactions into disk I/O transactions; and a back end caching subsystem in communication with the disk array I/O processor, the back end caching subsystem configured to cache disk array meta-data in the cache memory. Variations of this basic system are possible, and contemplated as within the scope of the present invention. The disk drive array controller may further include a communication path between the front end caching subsystem and the back end caching subsystem, whereby allocation of blocks in the caching subsystems are synchronized. The communication path may be, for example, a control store common to the front end caching subsystem and the back end caching subsystem, the control store holding a data structure through which caching and I/O transaction information are communicated between the front end caching subsystem and the back end caching subsystem.
摘要:
A method of data sharing among multiple entities is provided. Each entity exports to a data store a structured data representation comprising a versioned file system local to that entity. The method begins by forming a sharing group that includes two or more entities. Sharing of the structured data representations by members of the sharing group is enabled. The filers use a single distributed lock to protect each version of the file system. This lock is managed to allow each filer access to the shared file system volume to create its new version. To share a fully-versioned file system, asynchronous updates at each of the filers is permitted, and each node is then allowed to “push” its individual changes to the store to form the next version of the file system. A mechanism also may be used to reduce the period during which filers in the group operate under lock.
摘要:
A versioned file system comprises a set of structured data representations. At a first time, an interface creates and exports to a data store a first structured data representation corresponding to a first version of a local file system. The first structured data representation is an XML tree having a root element, one or more directory elements associated with the root element, and one or more file elements associated with a given directory element. Upon a change within the file system (e.g., file creation, file deletion, file modification, directory creation, directory deletion and directory modification), the interface creates and exports a second structured data representation corresponding to a second version of the file system. The second structured data representation differs from the first structured data representation up to and including the root element of the second structured data representation. The data store may comprise a cloud storage service provider.
摘要:
A cluster recovery process is implemented across a set of distributed archives, where each individual archive is a storage cluster of preferably symmetric nodes. Each node of a cluster typically executes an instance of an application that provides object-based storage of fixed content data and associated metadata. According to the storage method, an association or “link” between a first cluster and a second cluster is first established to facilitate replication. The first cluster is sometimes referred to as a “primary” whereas the “second” cluster is sometimes referred to as a “replica.” Once the link is made, the first cluster's fixed content data and metadata are then replicated from the first cluster to the second cluster, preferably in a continuous manner. Upon a failure of the first cluster, however, a failover operation occurs, and clients of the first cluster are redirected to the second cluster. Upon repair or replacement of the first cluster (a “restore”), the repaired or replaced first cluster resumes authority for servicing the clients of the first cluster. This restore operation preferably occurs in two stages: a “fast recovery” stage that involves preferably “bulk” transfer of the first cluster metadata, following by a “fail back” stage that involves the transfer of the fixed content data. Upon receipt of the metadata from the second cluster, the repaired or replaced first cluster resumes authority for the clients irrespective of whether the fail back stage has completed or even begun.