摘要:
The invention provides a method and system for recovery of file system data in file servers having mirrored file system volumes. The invention makes use of a “snapshot” feature of a robust file system (the “WAFL File System”) disclosed in the Incorporated Disclosures, to rapidly determined which of two or more mirrored volumes is most up-to-date, and which file blocks of the most recent mirrored volume have been changed from each one of the mirrored file systems. In a preferred embodiment, among a plurality of mirrored volumes, the invention rapidly determines which is the most up-to-date by examining a consistency point number maintained by the WAFL File System at each mirrored volume. The invention rapidly pairwise determines what blocks are shared between that most up-to-date mirrored volume and each other mirrored volume, in response to a snapshot of the file system maintained at each mirrored volume and are stored in common pairwise between each mirrored volume and the most up-to-date mirrored volume. The invention re synchronizes only those blocks that have been changed between the common snapshot and the most up-to-date snapshot.
摘要:
The invention features a method for controlling storage of data in a plurality of storage devices each including storage blocks, for example, in a RAID array. The method includes receiving a plurality of write requests associated with data, and buffering the write requests. A file system defines a group of storage blocks, responsive to disk topology information. The group includes a plurality of storage blocks in each of the plurality of storage devices. Each data block of the data to be written is associated with a respective one of the storage blocks, for transmitting the association to the plurality of storage devices.
摘要:
In one embodiment, a first storage device and a second storage device form a mirror. When the first storage device loses synchronization with the second storage device, data present in the second storage device but not in the first storage device are identified. The identified data are then copied to the first storage device. In one embodiment, a method of rebuilding data in a storage device includes the act of replacing a failed storage device with a replacement storage device. Up-to-date data for the failed storage device, which may be stored in a corresponding mirror, may then be copied to the replacement storage device. Thereafter, the replacement storage device and any other storage devices that have lost synchronization with their mirror are resynchronized.
摘要:
A method for storing data on a plurality of storage devices of a storage system is disclosed. The data is received as data blocks from a plurality of write requests. The data blocks are saved as buffered data for writing to the storage devices in a single write request. An indication is received indicating the available storage blocks on the plurality of storage devices which are available for writing. The buffered data is associated with selected storage blocks of the storage blocks which are available for writing. The buffered data is written to the selected storage blocks in a single write request.
摘要:
A method and apparatus for a reliable data storage system using block level checksums appended to data blocks. Files are stored on hard disks in storage blocks, including data blocks and block-appended checksums. The block-appended checksum includes a checksum of the data block, a VBN, a DBN, and an embedded checksum for checking the integrity of the block-appended checksum itself. A file system includes file blocks with associated block-appended checksum to the data blocks. The file blocks with block-appended checksums are written to storage blocks. In a preferred embodiment a collection of disk drives are formatted with 520 bytes of data per sector. For each 4,096-byte file block, a corresponding 64-byte block-appended checksum is appended to the file block with the first 7 sectors including most of the file block data while the 8th sector includes the remaining file block data and the 64-byte block-appended checksum.
摘要:
The invention provides a method and system for recovery of file system data in file servers having mirrored file system volumes. The invention makes use of a “snapshot” feature of a robust file system (the “WAFL File System”) disclosed in the Incorporated Disclosures, to rapidly determined which of two or more mirrored volumes is most up-to-date, and which file blocks of the most recent mirrored volume have been changed from each one of the mirrored file systems. In a preferred embodiment, among a plurality of mirrored volumes, the invention rapidly determines which is the most up-to-date by examining a consistency point number maintained by the WAFL File System at each mirrored volume. The invention rapidly pairwise determines what blocks are shared between that most up-to-date mirrored volume and each other mirrored volume, in response to a snapshot of the file system maintained at each mirrored volume and are stored in common pairwise between each mirrored volume and the most up-to-date mirrored volume. The invention re synchronizes only those blocks that have been changed between the common snapshot and the most up-to-date snapshot.
摘要:
The invention features a method for controlling storage of data in a plurality of storage devices each including storage blocks, for example, in a RAID array. The method includes receiving a plurality of write requests associated with data, and buffering the write requests. A file system defines a group of storage blocks, responsive to disk topology information. The group includes a plurality of storage blocks in each of the plurality of storage devices. Each data block of the data to be written is associated with a respective one of the storage blocks, for transmitting the association to the plurality of storage devices.
摘要:
The present invention implements an I/O task architecture in which an I/O task requested by the storage manager, for example a stripe write, is decomposed into a number of lower-level asynchronous I/O tasks that can be scheduled independently. Resources needed by these lower-level I/O tasks are dynamically assigned, on an as-needed basis, to balance the load and use resources efficiently, achieving higher scalability. A hierarchical order is assigned to the I/O tasks to ensure that there is a forward progression of the higher-level I/O task and to ensure that resources do not become deadlocked.
摘要:
A storage operating system is configured to assign volume block numbers (VBNs) to a volume. The system has a plurality of disks, and each disk of the plurality of disks is assigned disk block numbers (DBNs). A raidmap is configured to map the VBNs to the DBNs of the plurality of physical disks, the mapping for a particular disk stored in a disk label for the particular disk. The disk label for the particular disk is then written to the particular disk.
摘要:
A system and method are disclosed for rendering devices on a cluster globally visible, wherein the cluster includes a plurality of nodes on which the devices are attached. The system establishes for each of the devices in the cluster at least one globally unique identifier enabling global access to the device. The system includes a device registrar that creates the identifiers and a global file system. The identifiers include a globally unique logical name by which users of the cluster identify the device and a globally unique physical name by which the global file system identifies the device. The registrar creates a one-to-one mapping between the logical name and the physical name for each of the devices. The system also includes a device information (dev.sub.-- info) data structure maintained by the device registrar that represents physical associations of the devices within the cluster. Each association corresponds to the physical name of a device file maintained by the global file system. The device registrar determines for an attached device a globally unique, device type (dev.sub.-- t) value; creates dev.sub.-- info data structure entry and a corresponding physical name; generates a logical name based on the dev.sub.-- t value and the physical name; and associates the dev.sub.-- t value with the device file representing the attached device. Given this framework, a user of the cluster can access any of the devices by issuing the global file system an access request identifying the device to be accessed by its logical name.