Abstract:
When a client computer requests data from a disk or similar device at a server computer, the client exports the memory associated with an allocated read buffer by generating and storing one or more incoming MMU (IMMU) entries that map the read buffer to an assigned global address range. The remote data read request, along with the assigned global address range is communicated to the server node. At the server, the request is serviced by performing a memory import operation, in which one or more outgoing MMU (OMMU) entries are generated and stored for mapping the global address range specified in the read request to a corresponding range of local physical addresses. The mapped local physical addresses in the server are not locations in the server's memory. The server then performs a DMA operation for directly transferring the data specified in the request message from the disk to the mapped local physical addresses. The DMA operation transmits the specified data to the server's network interface, at which the mapped local physical addresses to which the data is transferred are converted into the corresponding global addresses. The specified data with the corresponding global addresses are then transmitted to the client node. The client converts the global addresses in the received specified data into the local physical addresses corresponding to the allocated receive buffer, and stores the received specified data in the allocated receive buffer.
Abstract:
In one exemplary embodiment, a method includes the step of instantiating, with at least one processor, a storage object. The storage object includes a unique identifier, a data element and a virtual storage object. The virtual storage object is formed in the storage object. The virtual storage object includes a virtual data element. A set of kernel functions of a client-side computing system utilizing the application-orientedmiddleware layer are invoked. The set of kernel functions implement formation of an application data object. The application data object maps files and directories to the virtual storage object and integrates into a filesystem interface of an operating system of the client-side computing system. The application data object is formed. A data routing service of the application data object is initiated.
Abstract:
The invention features a method for controlling storage of data in a plurality of storage devices each including storage blocks, for example, in a RAID array. The method includes receiving a plurality of write requests associated with data, and buffering the write requests. A file system defines a group of storage blocks, responsive to disk topology information. The group includes a plurality of storage blocks in each of the plurality of storage devices. Each data block of the data to be written is associated with a respective one of the storage blocks, for transmitting the association to the plurality of storage devices.
Abstract:
The invention provides a method and system for persistent context-based behavior injection in a computing system, such as in a redundant storage system or another system having a layered or modular architecture. Behaviors that are injected can be specified to have triggering conditions, such that the behavior is not injected unless the conditions are true. Triggering conditions may include a selected ordering of conditions and a selected context for each behavior. In a system having a layered architecture, behavior injection might be used to evaluate correct responses in the face of cascaded errors in a specific context or thread, other errors that are related by context, concurrent errors, or multiple errors. Behavior injection uses non-volatile memory to preserve persistence of filter context information across possible system errors, for reporting of the results of behavior injection, and to preserve information across recovery from system errors. Multiple behavior injection threads are also provided. Behavior injection can also be performed in a logically distributed system or from a logically remote system.
Abstract:
A method for operating a data storage system is described. The method first constructs an I/O tree representing a logical configuration of storage devices coupled to the storage system, the I/O tree representing a flow of I/O operations to the storage devices. Elements of the I/O tree are represented by objects. A freeze condition is imposed on a selected object of the I/O tree in order to disable a portion of the storage devices serviced by the selected object. Configuration management operations are performed on the portion of the storage devices serviced by the selected object. The freeze condition is removed from the selected object in response to completion of the configuration management, in order to resume I/O operations to the portion of the storage devices serviced by the selected object.
Abstract:
The invention provides flexible disabling of disk sets. One or more disks in a RAID sub-system may be identified as temporarily inactive. The disk or disks are then marked as inactive by setting one of a set of bits associated with each disk in the RAID subsystem. If an inactivated disk is a data disk, marking it as inactive also marks it as read only. If an inactivated disk is a parity disk, the RAID group to which it supplies parity is also inactivated and a file system must look to a mirror of the inactivated RAID sub-system for its data. When a data disk is reactivated it is marked as read/write by clearing its associated bit. When a parity disk is reactivated it is also marked as read/write by clearing its bit, however, it is not available for use until it has synchronized its operation with its mirror.
Abstract:
The present invention implements an I/O task architecture in which an I/O task requested by the storage manager, for example a stripe write, is decomposed into a number of lower-level asynchronous I/O tasks that can be scheduled independently. Resources needed by these lower-level I/O tasks are dynamically assigned, on an as-needed basis, to balance the load and use resources efficiently, achieving higher scalability. A hierarchical order is assigned to the I/O tasks to ensure that there is a forward progression of the higher-level I/O task and to ensure that resources do not become deadlocked.
Abstract:
In one embodiment, a first storage device and a second storage device form a mirror. When the first storage device loses synchronization with the second storage device, data present in the second storage device but not in the first storage device are identified. The identified data are then copied to the first storage device. In one embodiment, a method of rebuilding data in a storage device includes the act of replacing a failed storage device with a replacement storage device. Up-to-date data for the failed storage device, which may be stored in a corresponding mirror, may then be copied to the replacement storage device. Thereafter, the replacement storage device and any other storage devices that have lost synchronization with their mirror are resynchronized.
Abstract:
An improved method and apparatus for creating a snapshot of a file system. A record of which blocks are being used by a snapshot is included in the snapshot itself, allowing effectively instantaneous snapshot creation and deletion. The state of the active file system is described by a set of metafiles; in particular, a bitmap (henceforth the “active map”) describes which blocks are free and which are in use. The inode file describes which blocks are used by each file, including the metafiles. The inode file itself is described by a special root inode, also known as the “fsinfo block”. The system begins creating a new snapshot by making a copy of the root inode. This copy of the root inode becomes the root of the snapshot.
Abstract:
A data access request to a file system is decomposed into a plurality of lower-level I/O tasks. A logical combination of physical storage components is represented as a hierarchical set of objects. A parent I/O task is generated from a first object in response to the data access request. A child I/O task is generated from a second object to implement a portion of the parent I/O task. The parent I/O task is suspended until the child I/O task completes. The child I/O task is executed in response to an occurrence of an event that a resource required by the child I/O task is available. The parent I/O task is resumed upon an event indicating completion of the child I/O task. Scheduling of any child I/O task is not conditional on execution of the parent I/O task, and a state diagram regulates the child I/O tasks.