Abstract:
A system and method reclaims unused storage space from a data container, such as a logical unit number (LUN) of a storage system. In particular, a novel technique is provided that allows a storage system to reclaim storage space not used by a client file system for which the storage system maintains storage, without requiring assistance from the client file system to determine storage usage. In other words, storage system may independently reclaim storage space not used by the client file system, without that file system's intervention.
Abstract:
Techniques for a data storage cluster and a method for maintaining and updating reliability data and reducing data communication between nodes, are disclosed herein. Each data object is written to a single data zone on a data node within the data storage cluster. Each data object includes one or more data chunks, and the data chunks of a data object are written to a data node in an append-only log format. When parity is determined for a reliability group including the data zone, there is no need to transmit data from other data nodes where the rest of data zones of the reliability group reside. Thus, inter-node data communication for determining reliability data is reduced.
Abstract:
A facility for comparing two datasets and identifying metadata differences between the two datasets irrespective of the manner in which the data is stored. In some embodiments, the facility includes a comparison unit and a catalog unit. The comparison unit compares a hierarchical hash of a first dataset with a hierarchical hash of a second dataset, the hierarchical hashes each including a plurality of hierarchical hash values, to identify differences in metadata of the first and second datasets by progressively comparing the hierarchical hash values of the first and second hierarchical hashes without comparing the metadata of the first and second datasets. The catalog unit generates a catalog of differences between the first and second datasets, the catalog indicating differences in metadata of the first and second datasets.
Abstract:
A distributed object store in a network storage system uses location-independent global object identifiers (IDs) for stored data objects. The global object ID enables a data object to be seamlessly moved from one location to another without affecting clients of the storage system, i.e., “transparent migration”. The global object ID can be part of a multilevel object handle, which also can include a location ID indicating the specific location at which the data object is stored, and a policy ID identifying a set of data management policies associated with the data object. The policy ID may be associated with the data object by a client of the storage system, for example when the client creates the object, thus allowing “inline” policy management. An object location subsystem (OLS) can be used to locate an object when a client request does not contain a valid location ID for the object.
Abstract:
A distributed object store in a network storage system uses location-independent global object identifiers (IDs) for stored data objects. The global object ID enables a data object to be seamlessly moved from one location to another without affecting clients of the storage system, i.e., “transparent migration”. The global object ID can be part of a multilevel object handle, which also can include a location ID indicating the specific location at which the data object is stored, and a policy ID identifying a set of data management policies associated with the data object. The policy ID may be associated with the data object by a client of the storage system, for example when the client creates the object, thus allowing “inline” policy management. An object location subsystem (OLS) can be used to locate an object when a client request does not contain a valid location ID for the object.
Abstract:
Method and system for processing a plurality of requests for accessing “small files” stored at a storage device is provided. A user may define file size and each small file may include one or more blocks of data. The requests are sorted based on an address of a first data block for each small file. The sorted requests are then used to access the stored files, instead of accessing the requests based on when a request was received.
Abstract:
A system and method for nearly in-band search indexing. A network switch (or other intermediate network device) is configured to provide port mirroring so that data access requests directed to a storage system are forwarded to both the storage system and to a search appliance. The search appliance collects index information from the received data access requests to update a search index. As the search appliance is nearly in-band, i.e., not directly in-line of the data access request path, no increase of latency occurs for processing data access requests by the storage system.
Abstract:
The techniques introduced here provide for efficient management of storage resources in a modern, dynamic data center through the use of virtual storage appliances. Virtual storage appliances perform storage operations and execute in or as a virtual machine on a hypervisor. A storage management system monitors a storage system to determine whether the storage system is satisfying a service level objective for an application. The storage management system then manages (e.g., instantiates, shuts down, or reconfigures) a virtual storage appliance on a physical server. The virtual storage appliance uses resources of the physical server to meet the storage related needs of the application that the storage system cannot provide. This automatic and dynamic management of virtual storage appliances by the storage management system allows storage systems to quickly react to changing storage needs of applications without requiring expensive excess storage capacity.
Abstract:
A method and apparatus for proxying search requests for a storage system and maintaining a central index for performing the search requests is described herein. An index manager on the storage system may initially produce the central index by examining each file in a file system and update the central index thereafter by examining only those files that have changed since the central index was initially produced or last updated. The index manager may receive a changed file list from a differencing layer configured for comparing snapshots of the file system at different time points to produce changed file lists. A search proxy module may receive search requests in a search protocol and proxy the search requests to a search engine by converting the search requests to another search protocol compatible with the search engine. The search engine may then use the central index for performing the search request.
Abstract:
Methods and system for securely capturing workloads at a live network for replaying at a test network. The disclosed system captures file system states and workloads of a live server at the live network. In one embodiment the captured data is anonymized to protect confidentiality of the data. A file system of a test server at the test network is mirrored from a captured state of the live server. An anonymized version of the captured workloads is replayed as a request to the test server. A lost or incomplete command is recreated from the states of the live server. An order of the commands during replay can be based on an order in the captured workload, or based on a causal relationship. Performance characteristics of the live network are determined based on the response to the replayed command.