Abstract:
In an embodiment, a mapping method of an accelerated application-oriented middleware layer is provided. The method includes, using a first mapper, determining for an input output operation whether a data storage location has been designated for storing a corresponding data in a virtual storage object, the input output operation involving the corresponding data. The method further includes, using the first mapper and at least one processor, acquiring the virtual element identification of the corresponding data. The method also includes, using the virtual element identification and the corresponding data, performing the input output operation.
Abstract:
In an embodiment, a mapping method of an accelerated application-oriented middleware layer is provided. The method includes, using a first mapper, determining for an input output operation whether a data storage location has been designated for storing a corresponding data in a virtual storage object, the input output operation involving the corresponding data. The method further includes, using the first mapper and at least one processor, acquiring the virtual element identification of the corresponding data. The method also includes, using the virtual element identification and the corresponding data, performing the input output operation.
Abstract:
A data access request to a file system is decomposed into a plurality of lower-level I/O tasks. A logical combination of physical storage components is represented as a hierarchical set of objects. A parent I/O task is generated from a first object in response to the data access request. A child I/O task is generated from a second object to implement a portion of the parent I/O task. The parent I/O task is suspended until the child I/O task completes. The child I/O task is executed in response to an occurrence of an event that a resource required by the child I/O task is available. The parent I/O task is resumed upon an event indicating completion of the child I/O task. Scheduling of any child I/O task is not conditional on execution of the parent I/O task, and a state diagram regulates the child I/O tasks.
Abstract:
A method for storing data on a plurality of storage devices of a storage system is disclosed. The data is received as data blocks from a plurality of write requests. The data blocks are saved as buffered data for writing to the storage devices in a single write request. An indication is received indicating the available storage blocks on the plurality of storage devices which are available for writing. The buffered data is associated with selected storage blocks of the storage blocks which are available for writing. The buffered data is written to the selected storage blocks in a single write request.
Abstract:
A technique coherently suspends input/output (I/O) operations in a RAID subsystem of a storage system. A configuration tree of the RAID subsystem has a plurality of objects representing a logical configuration of storage devices coupled to the system. According to the technique, a “freeze” condition may be imposed on an object of the configuration tree to suspend I/O operations directed to that object. In order to freeze, I/O operations underway (“in flight”) in the RAID subsystem and directed to the object need to complete sufficiently so as to reach a recoverable state in the event the subsystem subsequently fails prior to an I/O restart procedure. Once a freeze condition has been imposed, new I/O requests directed to the object are inserted onto a freeze list of pending requests at the RAID subsystem and are blocked from processing until the object is “unfrozen” (i.e., the freeze condition is lifted).
Abstract:
The invention provides a method and system for persistent context-based behavior injection in a computing system, such as in a redundant storage system or another system having a layered or modular architecture. Behaviors that are injected can be specified to have triggering conditions, such that the behavior is not injected unless the conditions are true. Triggering conditions may include a selected ordering of conditions and a selected context for each behavior. In a system having a layered architecture, behavior injection might be used to evaluate correct responses in the face of cascaded errors in a specific context or thread, other errors that are related by context, concurrent errors, or multiple errors. Behavior injection uses non-volatile memory to preserve persistence of filter context information across possible system errors, for reporting of the results of behavior injection, and to preserve information across recovery from system errors. Multiple behavior injection threads are also provided. Behavior injection can also be performed in a logically distributed system or from a logically remote system.
Abstract:
A method and apparatus for a reliable data storage system using block level checksums appended to data blocks. Files are stored on hard disks in storage blocks, including data blocks and block-appended checksums. The block-appended checksum includes a checksum of the data block, a VBN, a DBN, and an embedded checksum for checking the integrity of the block-appended checksum itself. A file system includes file blocks with associated block-appended checksum to the data blocks. The file blocks with block-appended checksums are written to storage blocks. In a preferred embodiment a collection of disk drives are formatted with 520 bytes of data per sector. For each 4,096-byte file block, a corresponding 64-byte block-appended checksum is appended to the file block with the first 7 sectors including most of the file block data while the 8th sector includes the remaining file block data and the 64-byte block-appended checksum.
Abstract:
The invention provides a method and system for reducing RAID parity computation following a RAID subsystem failure. Ranges of RAID stripes are assigned to bits in a bitmap that is stored on disk. When writes to the RAID are in progress, the bit associated with the range of stripes in the bitmap is set. When a failure occurs during the write process, the bitmap is analyzed on reboot to determine which ranges of stripes where in the process of being written, and the parity data for only those ranges of stripes is recomputed. Efficiency is increased by use of an in-memory write counter that tracks multiple writes to each stripe range. Using the write counter, the bitmap is written to disk only after each cycle of its associated bitmap bit being set to a value of 1 and then returning to zero. The invention may be installed, modified, and removed at will from a RAID array, and this may be accomplished while the system is in operation.
Abstract:
The invention provides a method and system for recovery of file system data in file servers having mirrored file system volumes. The invention makes use of a “snapshot” feature of a robust file system (the “WAFL File System”) disclosed in the Incorporated Disclosures, to rapidly determined which of two or more mirrored volumes is most up-to-date, and which file blocks of the most recent mirrored volume have been changed from each one of the mirrored file systems. In a preferred embodiment, among a plurality of mirrored volumes, the invention rapidly determines which is the most up-to-date by examining a consistency point number maintained by the WAFL File System at each mirrored volume. The invention rapidly pairwise determines what blocks are shared between that most up-to-date mirrored volume and each other mirrored volume, in response to a snapshot of the file system maintained at each mirrored volume and are stored in common pairwise between each mirrored volume and the most up-to-date mirrored volume. The invention re synchronizes only those blocks that have been changed between the common snapshot and the most up-to-date snapshot.
Abstract:
When a client computer requests data from a disk or similar device at a server computer, the client exports the memory associated with an allocated read buffer by generating and storing one or more incoming MMU (IMMU) entries that map the read buffer to an assigned global address range. The remote data read request, along with the assigned global address range is communicated to the server node. At the server, the request is serviced by performing a memory import operation, in which one or more outgoing MMU (OMMU) entries are generated and stored for mapping the global address range specified in the read request to a corresponding range of local physical addresses. The mapped local physical addresses in the server are not locations in the server's memory. The server then performs a DMA operation for directly transferring the data specified in the request message from the disk to the mapped local physical addresses. The DMA operation transmits the specified data to the server's network interface, at which the mapped local physical addresses to which the data is transferred are converted into the corresponding global addresses. The specified data with the corresponding global addresses are then transmitted to the client node. The client converts the global addresses in the received specified data into the local physical addresses corresponding to the allocated receive buffer, and stores the received specified data in the allocated receive buffer.