METHOD AND APPARATUS FOR DECOMPOSING I/O TASKS IN A RAID SYSTEM
    13.
    发明申请
    METHOD AND APPARATUS FOR DECOMPOSING I/O TASKS IN A RAID SYSTEM 有权
    用于在RAID系统中分解I / O任务的方法和装置

    公开(公告)号:US20110191780A1

    公开(公告)日:2011-08-04

    申请号:US13048513

    申请日:2011-03-15

    Abstract: A data access request to a file system is decomposed into a plurality of lower-level I/O tasks. A logical combination of physical storage components is represented as a hierarchical set of objects. A parent I/O task is generated from a first object in response to the data access request. A child I/O task is generated from a second object to implement a portion of the parent I/O task. The parent I/O task is suspended until the child I/O task completes. The child I/O task is executed in response to an occurrence of an event that a resource required by the child I/O task is available. The parent I/O task is resumed upon an event indicating completion of the child I/O task. Scheduling of any child I/O task is not conditional on execution of the parent I/O task, and a state diagram regulates the child I/O tasks.

    Abstract translation: 对文件系统的数据访问请求被分解成多个较低级的I / O任务。 物理存储组件的逻辑组合被表示为一组分层对象。 响应于数据访问请求,从第一对象生成父I / O任务。 从第二个对象生成子I / O任务,以实现父I / O任务的一部分。 父I / O任务被暂停,直到子I / O任务完成。 响应于发生子I / O任务所需的资源可用的事件,执行子I / O任务。 在指示完成子I / O任务的事件后,将恢复父I / O任务。 任何子I / O任务的调度不是执行父I / O任务的条件,而状态图则规定了子I / O任务。

    Technique for coherent suspension of I/O operations in a RAID subsystem
    15.
    发明授权
    Technique for coherent suspension of I/O operations in a RAID subsystem 有权
    在RAID子系统中进行I / O操作连续停止的技术

    公开(公告)号:US07328364B1

    公开(公告)日:2008-02-05

    申请号:US10394917

    申请日:2003-03-21

    CPC classification number: G06F11/2087

    Abstract: A technique coherently suspends input/output (I/O) operations in a RAID subsystem of a storage system. A configuration tree of the RAID subsystem has a plurality of objects representing a logical configuration of storage devices coupled to the system. According to the technique, a “freeze” condition may be imposed on an object of the configuration tree to suspend I/O operations directed to that object. In order to freeze, I/O operations underway (“in flight”) in the RAID subsystem and directed to the object need to complete sufficiently so as to reach a recoverable state in the event the subsystem subsequently fails prior to an I/O restart procedure. Once a freeze condition has been imposed, new I/O requests directed to the object are inserted onto a freeze list of pending requests at the RAID subsystem and are blocked from processing until the object is “unfrozen” (i.e., the freeze condition is lifted).

    Abstract translation: 一种技术在存储系统的RAID子系统中相干地挂起输入/输出(I / O)操作。 RAID子系统的配置树具有表示耦合到系统的存储设备的逻辑配置的多个对象。 根据该技术,可以对配置树的对象施加“冻结”状态,以暂停针对该对象的I / O操作。 为了冻结,RAID子系统中正在进行的(针对飞行中)操作的I / O操作需要充分完成,以便在子系统在I / O重新启动之前发生故障的情况下达到可恢复状态 程序。 一旦冻结条件被强制执行,定向到对象的新的I / O请求将被插入到RAID子系统的待处理请求的冻结列表上,并且被阻止处理,直到对象被“解冻”(即,冻结条件被解除 )。

    Persistent context-based behavior injection or testing of a computing system
    16.
    发明授权
    Persistent context-based behavior injection or testing of a computing system 有权
    持久的基于上下文的行为注入或测试计算系统

    公开(公告)号:US06976189B1

    公开(公告)日:2005-12-13

    申请号:US10105060

    申请日:2002-03-22

    CPC classification number: G06F11/3672

    Abstract: The invention provides a method and system for persistent context-based behavior injection in a computing system, such as in a redundant storage system or another system having a layered or modular architecture. Behaviors that are injected can be specified to have triggering conditions, such that the behavior is not injected unless the conditions are true. Triggering conditions may include a selected ordering of conditions and a selected context for each behavior. In a system having a layered architecture, behavior injection might be used to evaluate correct responses in the face of cascaded errors in a specific context or thread, other errors that are related by context, concurrent errors, or multiple errors. Behavior injection uses non-volatile memory to preserve persistence of filter context information across possible system errors, for reporting of the results of behavior injection, and to preserve information across recovery from system errors. Multiple behavior injection threads are also provided. Behavior injection can also be performed in a logically distributed system or from a logically remote system.

    Abstract translation: 本发明提供了一种在诸如冗余存储系统或具有分层或模块化架构的另一系统的计算系统中用于持久上下文的行为注入的方法和系统。 注入的行为可以被指定为具有触发条件,使得行为不被注入,除非条件为真。 触发条件可以包括所选择的条件排序和针对每个行为的选择的上下文。 在具有分层架构的系统中,可以使用行为注入来评估在特定上下文或线程中的级联错误的正确响应,与上下文,并发错误或多个错误相关的其他错误。 行为注入使用非易失性存储器来保留跨可能的系统错误的过滤器上下文信息的持久性,用于报告行为注入的结果,并在跨系统错误的恢复中保留信息。 还提供了多行为注入线程。 行为注入也可以在逻辑分布式系统中或从逻辑上远程系统执行。

    Block-appended checksums
    17.
    发明授权
    Block-appended checksums 有权
    块附加校验和

    公开(公告)号:US06952797B1

    公开(公告)日:2005-10-04

    申请号:US09696666

    申请日:2000-10-25

    CPC classification number: G06F11/1076 G11B20/18 H03M13/096

    Abstract: A method and apparatus for a reliable data storage system using block level checksums appended to data blocks. Files are stored on hard disks in storage blocks, including data blocks and block-appended checksums. The block-appended checksum includes a checksum of the data block, a VBN, a DBN, and an embedded checksum for checking the integrity of the block-appended checksum itself. A file system includes file blocks with associated block-appended checksum to the data blocks. The file blocks with block-appended checksums are written to storage blocks. In a preferred embodiment a collection of disk drives are formatted with 520 bytes of data per sector. For each 4,096-byte file block, a corresponding 64-byte block-appended checksum is appended to the file block with the first 7 sectors including most of the file block data while the 8th sector includes the remaining file block data and the 64-byte block-appended checksum.

    Abstract translation: 一种使用附加到数据块的块级校验和的可靠数据存储系统的方法和装置。 文件存储在存储块中的硬盘上,包括数据块和块附加校验和。 块附加的校验和包括数据块的校验和,VBN,DBN和用于检查块附加校验和本身的完整性的嵌入校验和。 文件系统包括具有与数据块相关联的块附加校验和的文件块。 具有块附加校验和的文件块被写入存储块。 在优选实施例中,磁盘驱动器的集合被格式化为每扇区520字节的数据。 对于每个4,096字节的文件块,相应的64字节的块附加校验和被附加到文件块,前7个扇区包括大部分文件块数据,而第8个扇区包括剩余的 文件块数据和64字节的块附加校验和。

    Reparity bitmap RAID failure recovery
    18.
    发明授权
    Reparity bitmap RAID failure recovery 有权
    可靠性位图RAID故障恢复

    公开(公告)号:US06799284B1

    公开(公告)日:2004-09-28

    申请号:US09797007

    申请日:2001-02-28

    CPC classification number: G06F11/1076

    Abstract: The invention provides a method and system for reducing RAID parity computation following a RAID subsystem failure. Ranges of RAID stripes are assigned to bits in a bitmap that is stored on disk. When writes to the RAID are in progress, the bit associated with the range of stripes in the bitmap is set. When a failure occurs during the write process, the bitmap is analyzed on reboot to determine which ranges of stripes where in the process of being written, and the parity data for only those ranges of stripes is recomputed. Efficiency is increased by use of an in-memory write counter that tracks multiple writes to each stripe range. Using the write counter, the bitmap is written to disk only after each cycle of its associated bitmap bit being set to a value of 1 and then returning to zero. The invention may be installed, modified, and removed at will from a RAID array, and this may be accomplished while the system is in operation.

    Abstract translation: 本发明提供了一种用于在RAID子系统故障之后减少RAID奇偶校验计算的方法和系统。 RAID条带的范围分配给存储在磁盘上的位图中的位。 当对RAID的写入进行中,与位图中的条带范围相关联的位被设置。 当在写入过程中发生故障时,在重新启动时分析位图,以确定在写入过程中哪个条带范围,并且仅重新计算那些条带范围的奇偶校验数据。 通过使用跟踪对每个条带范围的多次写入的内存中写计数器来提高效率。 使用写计数器,位图仅在其相关位图位的每个周期设置为1,然后返回到零后才写入磁盘。 本发明可以随意地从RAID阵列安装,修改和移除,并且这可以在系统运行时完成。

    Recovery of file system data in file servers mirrored file system volumes
    19.
    发明授权
    Recovery of file system data in file servers mirrored file system volumes 有权
    在文件服务器中恢复文件系统数据镜像文件系统卷

    公开(公告)号:US06654912B1

    公开(公告)日:2003-11-25

    申请号:US09684487

    申请日:2000-10-04

    Abstract: The invention provides a method and system for recovery of file system data in file servers having mirrored file system volumes. The invention makes use of a “snapshot” feature of a robust file system (the “WAFL File System”) disclosed in the Incorporated Disclosures, to rapidly determined which of two or more mirrored volumes is most up-to-date, and which file blocks of the most recent mirrored volume have been changed from each one of the mirrored file systems. In a preferred embodiment, among a plurality of mirrored volumes, the invention rapidly determines which is the most up-to-date by examining a consistency point number maintained by the WAFL File System at each mirrored volume. The invention rapidly pairwise determines what blocks are shared between that most up-to-date mirrored volume and each other mirrored volume, in response to a snapshot of the file system maintained at each mirrored volume and are stored in common pairwise between each mirrored volume and the most up-to-date mirrored volume. The invention re synchronizes only those blocks that have been changed between the common snapshot and the most up-to-date snapshot.

    Abstract translation: 本发明提供了一种用于在具有镜像文件系统卷的文件服务器中恢复文件系统数据的方法和系统。 本发明利用了“公司披露”中公开的强大的文件系统(“WAFL文件系统”)的“快照”功能,以快速确定两个或多个镜像卷中的哪一个是最新的,以及哪个文件 最近的镜像卷的块已从每个镜像文件系统更改。 在优选实施例中,在多个镜像卷中,本发明通过在每个镜像卷处检查由WAFL文件系统维护的一致性点数来快速确定哪个是最新的。 响应于在每个镜像卷保持的文件系统的快照,本发明快速成对地确定在最新的最新镜像卷和每个其他镜像卷之间共享哪些块,并且在每个镜像卷和 最新的镜像卷。 本发明仅重新同步在公共快照和最新的快照之间已经改变的那些块。

    System and method for efficient remote disk I/O
    20.
    发明授权
    System and method for efficient remote disk I/O 有权
    高效远程磁盘I / O的系统和方法

    公开(公告)号:US6049808A

    公开(公告)日:2000-04-11

    申请号:US216507

    申请日:1998-12-18

    Abstract: When a client computer requests data from a disk or similar device at a server computer, the client exports the memory associated with an allocated read buffer by generating and storing one or more incoming MMU (IMMU) entries that map the read buffer to an assigned global address range. The remote data read request, along with the assigned global address range is communicated to the server node. At the server, the request is serviced by performing a memory import operation, in which one or more outgoing MMU (OMMU) entries are generated and stored for mapping the global address range specified in the read request to a corresponding range of local physical addresses. The mapped local physical addresses in the server are not locations in the server's memory. The server then performs a DMA operation for directly transferring the data specified in the request message from the disk to the mapped local physical addresses. The DMA operation transmits the specified data to the server's network interface, at which the mapped local physical addresses to which the data is transferred are converted into the corresponding global addresses. The specified data with the corresponding global addresses are then transmitted to the client node. The client converts the global addresses in the received specified data into the local physical addresses corresponding to the allocated receive buffer, and stores the received specified data in the allocated receive buffer.

    Abstract translation: 当客户端计算机从服务器计算机的磁盘或类似设备请求数据时,客户端通过生成并存储将读取缓冲区映射到所分配的全局的一个或多个输入MMU(IMMU)条目来导出与分配的读取缓冲器相关联的存储器 地址范围 远程数据读取请求以及分配的全局地址范围被传送到服务器节点。 在服务器上,通过执行内存导入操作来服务请求,其中生成并存储一个或多个输出MMU(OMMU)条目,用于将读取请求中指定的全局地址范围映射到本地物理地址的对应范围。 服务器中映射的本地物理地址不是服务器内存中的位置。 然后,服务器执行DMA操作,以将请求消息中指定的数据从磁盘直接传输到映射的本地物理地址。 DMA操作将指定的数据传输到服务器的网络接口,将传输数据的映射的本地物理地址转换为相应的全局地址。 然后将具有相应全局地址的指定数据发送到客户端节点。 客户端将接收到的指定数据中的全局地址转换为与分配的接收缓冲区对应的本地物理地址,并将接收到的指定数据存储在分配的接收缓冲区中。

Patent Agency Ranking