Abstract:
A method for constructing an array of storage devices is disclosed. A number of storage devices is selected. Each storage device stores data in data blocks. A redundancy is established, the redundancy being a number of data blocks for a parity block. A plurality of parity sets is established, a parity set having the number of data blocks for a parity block and having a set parity block. The set parity block for each parity set is computed. Each set parity block is stored on one of the data storage devices that does not store one of the data-block members of the parity set, to store more than one parity block on each storage device, with an exception that two parity blocks for the same data storage location are precluded from being stored on the same data storage device.
Abstract:
Methods, systems, and computer executable instructions for performing distributed data analytics are provided. In one exemplary embodiment, a method of performing a distributed data analytics job includes collecting application-specific information in a processing node assigned to perform a task to identify data necessary to perform the task. The method also includes requesting a chunk of the necessary data from a storage server based on location information indicating one or more locations of the data chunk and prioritizing the request relative to other data requests associated with the job. The method also includes receiving the data chunk from the storage server in response to the request and storing the data chunk in a memory cache of the processing node which uses a same file system as the storage server.
Abstract:
A system for controlling power usage in a storage cluster by dynamically controlling membership in the storage cluster is disclosed. The storage cluster includes multiple storage servers that provide access to one or more storage subsystems. The power management system uses a power management policy to set parameters for controlling membership in the storage cluster and monitors the storage cluster based on the policy. Based on the monitoring, the system detects when the number of storage servers in the storage cluster should be reduced or increased. To reduce the number, the system selects a storage server to deactivate and directs the selected storage server to migrate storage resources (e.g. data, metadata) associated with the server to a different storage server. The system then deactivates the selected storage server by directing it to transition to a low power mode. The system may increase the number of servers in the storage cluster by reversing these steps.
Abstract:
Systems and methods for scheduling requests to access data may adjust the priority of such requests based on the presence of de-duplicated data blocks within the requested set of data blocks. A data de-duplication process operating on a storage device may build a de-duplication data map that stores information about the presence and location of de-duplicated data blocks on the storage drive. An I/O scheduler that manages the access requests can employ the de-duplicated data map to identify and quantify any de-duplicated data blocks within an access request. The I/O scheduler can then adjust the priority of the access request, based at least in part, on whether de-duplicated data blocks provide a large enough sequence of data blocks to reduce the likelihood that servicing the request, even if causing a head seek operation, will not reduce the overall global throughput of the storage system.
Abstract:
A storage area network (SAN)-attached storage system architecture is disclosed. The storage system provides strongly consistent distributed storage communication protocol semantics, such as SCSI target semantics. The system includes a mechanism for presenting a single distributed logical unit, comprising one or more logical sub-units, as a single logical unit of storage to a host system by associating each of the logical sub-units that make up the single distributed logical unit with a single host visible identifier that corresponds to the single distributed logical unit. The system further includes mechanisms to maintain consistent context information for each of the logical sub-units such that the logical sub-units are not visible to a host system as separate entities from the single distributed logical unit.
Abstract:
Described herein is a high-availability storage system having hierarchical levels of storage functions. The storage system may comprise one or more hierarchical levels, each hierarchical level comprising physical servers and be assigned to perform a particular set of storage functions. Each physical server may implement one or more VMs configured to perform only the set of storage functions assigned to the hierarchical level on which the VM executes. VMs of a first hierarchical level may be configured to organize the VMs of a second hierarchical level into a redundant array of storage access servers for providing data reliability and high-availability of the storage system. VMs of a first hierarchical level are configured to produce and route sub-requests to the VMs of a second hierarchical level. Failure of a sub-request is detected and remedied by a VM of the first hierarchical level.
Abstract:
An apparatus comprising a remote storage array, a primary storage array and a network. The remote storage array may be configured to (i) define a queue size based on a performance capability of the remote storage array, (ii) generate a multiplier based on resources being used by the remote storage array, and (iii) adjust the queue size by the multiplier. The primary storage array may be configured to execute input/output (IO) requests between the remote storage array and the primary storage array based on the adjusted queue size. The network may be configured to connect the remote storage array to the primary storage array.
Abstract:
Methods and systems for periodically analyzing and correcting storage load imbalances in a storage network environment including virtual machines are described. These methods and systems account for various resource types, logical access paths, and relationships among different storage environment components. Load balancing may be managed in terms of input/output (I/O) traffic and storage utilization. The aggregated information is stored, and may be used to identify and correct load imbalances in a virtual server environment in order to prevent primary congestion and bottlenecks.
Abstract:
Described herein are a system and method for remote mirroring of data and metadata from a local node to a remote node using out-of-order delivery (OOD), while also providing data integrity at the remote node. OOD may utilize increased throughput of multiple connection paths between nodes. A mirroring layer/engine executing on the local node may receive related groups of data and metadata for storing to the remote node, each related group comprising one or more data sets and one metadata set that describes and is associated with each of the one or more data sets in the related group. The mirroring layer provides data integrity at the remote node by ensuring that the metadata set of a related group is stored to the remote node only after all the data sets in the related group are stored to the remote node, thus ensuring data consistency at the remote node.
Abstract:
A method and apparatus for identifying ownership by a computer of a storage device connected to a computer network is described. A first ownership information is written into a selected sector of the storage device by a computer having ownership of the device as a first indicia of ownership. A second ownership information is written into a storage device label of the storage device by the computer having ownership as a second indicia of ownership, the storage device label visible to a plurality of computers connected to the computer network. In the event that at a future time the first indicia of ownership does not match the second indicia of ownership, the first indicia of ownership is taken as definitive of ownership of the storage device.