Flash optimized, log-structured layer of a file system
    51.
    发明授权
    Flash optimized, log-structured layer of a file system 有权
    闪存优化,文件系统的日志结构层

    公开(公告)号:US08880788B1

    公开(公告)日:2014-11-04

    申请号:US14160991

    申请日:2014-01-22

    Applicant: NetApp, Inc.

    Abstract: In one embodiment, a flash-optimized, log-structured layer of a file system of a storage input/output (I/O) stack executes on one or more nodes of a cluster. The log-structured layer of the file system provides sequential storage of data and metadata on solid state drives (SSDs) to reduce write amplification, while leveraging variable compression and variable length data features of the storage I/O stack. The data may be organized as an arbitrary number of variable-length extents of one or more host-visible logical units (LUNs). The metadata may include mappings from host-visible logical block address ranges of a LUN to extent keys, as well as mappings of the extent keys to SSD storage locations of the extents. The storage location of an extent on SSD is effectively “virtualized” by its mapped extent key such that relocation of the extent on SSD does not require update to volume layer metadata.

    Abstract translation: 在一个实施例中,存储输入/输出(I / O)堆栈的文件系统的闪存优化的日志结构化层在集群的一个或多个节点上执行。 文件系统的日志结构化层在固态驱动器(SSD)上提供数据和元数据的顺序存储,以减少写入放大,同时利用存储I / O堆栈的可变压缩和可变长度数据特征。 数据可以被组织为一个或多个主机可见逻辑单元(LUN)的任意数量的可变长度盘区。 元数据可以包括从LUN到扩展密钥的主机可见逻辑块地址范围的映射,以及扩展密钥到扩展区的SSD存储位置的映射。 SSD上的盘区的存储位置被其映射的盘区密钥有效地“虚拟化”,使得SSD上盘区的重新定位不需要更新到卷层元数据。

    Extent metadata update logging and checkpointing
    52.
    发明授权
    Extent metadata update logging and checkpointing 有权
    扩展元数据更新记录和检查点

    公开(公告)号:US08880787B1

    公开(公告)日:2014-11-04

    申请号:US14160259

    申请日:2014-01-21

    Applicant: NetApp, Inc.

    Abstract: In one embodiment, an extent store layer of a storage input/output (I/O) stack executing on one or more nodes of a cluster manages efficient logging and checkpointing of metadata. The metadata managed by the extent store layer, i.e., the extent store metadata, resides in a memory (in-core) of each node and is illustratively organized as a key-value extent store embodied as one or more data structures, e.g., a set of hash tables. Changes to the set of hash tables are recorded as a continuous stream of changes to SSD embodied as an extent store layer log. A separate log stream structure (e.g., an in-core buffer) may be associated respectively with each hash table such that changed (i.e., dirtied) slots of the hash table are recorded as entries in the log stream structure. The hash tables are written to SSD using a fuzzy checkpointing technique.

    Abstract translation: 在一个实施例中,在集群的一个或多个节点上执行的存储输入/输出(I / O)堆栈的盘区存储层管理元数据的有效日志记录和检查点。 由盘区存储层管理的元数据,即盘区存储元数据,驻留在每个节点的存储器(内核)中,并且被说明性地组织为体现为一个或多个数据结构的键值范围存储,例如, 一组哈希表。 对这组哈希表的更改记录为作为扩展存储层日志实现的SSD的连续变化流。 单独的日志流结构(例如,内核缓冲器)可以分别与每个散列表相关联,使得哈希表的改变(即,脏的)时隙被记录在日志流结构中作为条目。 使用模糊检查点技术将哈希表写入SSD。

    Technique for reducing metadata stored in a memory of a node

    公开(公告)号:US10762070B2

    公开(公告)日:2020-09-01

    申请号:US15895593

    申请日:2018-02-13

    Applicant: NetApp, Inc.

    Abstract: A technique reduces an amount of metadata stored in a memory of a node in a cluster. An extent store layer of a storage input/output (I/O) stack executing on the node stores key-value pairs in a plurality of data structures, e.g., cuckoo hash tables, resident in the memory. The cuckoo hash table embodies metadata that describes an extent and, as such, may be organized to associate a location on disk with a value that identifies the location on disk. The value may be embodied as a locator that includes a reference count used to support deduplication functionality of the extent store layer with respect to the extent. The reference count is divided into two portions: a delta count portion stored in memory for each slot of the hash table and an overflow count portion stored on disk in a header of each extent. One bit of the delta count portion is reserved as an overflow bit that indicates whether the in-memory reference count has overflowed. Another bit of the delta count portion is reserved as a sign bit that indicates whether the value of the remaining delta count portion, which stores the “delta” of the reference count, is positive or negative. Overflow updates to the overflow count portion on disk are postponed until all of the bits of the delta count portion are consumed as negative/positive transitions.

    Hybrid message-based scheduling technique

    公开(公告)号:US10185681B2

    公开(公告)日:2019-01-22

    申请号:US15051057

    申请日:2016-02-23

    Applicant: NetApp, Inc.

    Abstract: A hybrid message-based scheduling technique efficiently load balances a storage I/O stack partitioned into one or more non-blocking (i.e., free-running) messaging kernel (MK) threads that execute non-blocking message handlers (i.e., non-blocking services) and one or more operating system kernel blocking threads that execute blocking services. The technique combines the blocking and non-blocking services within a single coherent extended programming environment. The messaging kernel (MK) operates on processors apart from the operating system kernel that are allocated from a predetermined number of logical processors (i.e., hyper-threads) for use by an MK scheduler to schedule the non-blocking services within storage I/O stack as well as allocate a remaining number of logical processors for use by the blocking services. In addition, the technique provides a variation on a synchronization primitive that allows signaling between the two types of services (i.e., non-blocking and blocking) within the extended programming environment.

    GRANULAR SYNC/SEMI-SYNC ARCHITECTURE
    55.
    发明申请

    公开(公告)号:US20180124172A1

    公开(公告)日:2018-05-03

    申请号:US15844705

    申请日:2017-12-18

    Applicant: NetApp Inc.

    Abstract: Data consistency and availability can be provided at the granularity of logical storage objects in storage solutions that use storage virtualization in clustered storage environments. To ensure consistency of data across different storage elements, synchronization is performed across the different storage elements. Changes to data are synchronized across storage elements in different clusters by propagating the changes from a primary logical storage object to a secondary logical storage object. To satisfy the strictest RPOs while maintaining performance, change requests are intercepted prior to being sent to a filesystem that hosts the primary logical storage object and propagated to a different managing storage element associated with the secondary logical storage object.

    Technique for reducing metadata stored in a memory of a node

    公开(公告)号:US09934264B2

    公开(公告)日:2018-04-03

    申请号:US14728482

    申请日:2015-06-02

    Applicant: NetApp, Inc.

    Abstract: A technique reduces an amount of metadata stored in a memory of a node in a cluster. An extent store layer of a storage input/output (I/O) stack executing on the node stores key-value pairs in a plurality of data structures, e.g., cuckoo hash tables, resident in the memory. The cuckoo hash table embodies metadata that describes an extent and, as such, may be organized to associate a location on disk with a value that identifies the location on disk. The value may be embodied as a locator that includes a reference count used to support deduplication functionality of the extent store layer with respect to the extent. The reference count is divided into two portions: a delta count portion stored in memory for each slot of the hash table and an overflow count portion stored on disk in a header of each extent. One bit of the delta count portion is reserved as an overflow bit that indicates whether the in-memory reference count has overflowed. Another bit of the delta count portion is reserved as a sign bit that indicates whether the value of the remaining delta count portion, which stores the “delta” of the reference count, is positive or negative. Overflow updates to the overflow count portion on disk are postponed until all of the bits of the delta count portion are consumed as negative/positive transitions.

    CONSISTENCY GROUP MANAGEMENT
    57.
    发明申请

    公开(公告)号:US20170315728A1

    公开(公告)日:2017-11-02

    申请号:US15142767

    申请日:2016-04-29

    Applicant: NetApp, Inc.

    Abstract: A consistency group is used as a basic unit of data management of storage containers served by a storage input/output (I/O) stack executing on one or more nodes of a cluster. The storage container may be a LUN embodied as parent volume (active volume), a snapshot (represented as an independent volume embodied as read-only copy of the active volume), and a clone (represented as another independent volume embodied as a read-write copy (clone) of the active volume). A consistency group (CG) is a set (i.e., collection) of objects, e.g., LUNs or other CGs (nested CG), which may be managed and operated upon collectively by an administrative command via a Storage Area Network administration layer (SAL) of the storage I/O stack. The SAL may interact with one or more layers of the storage I/O stack to (i) create a clone of a set of object members of the CG; (ii) create one or more snapshots of the set of object members of the CG; (iii) restore the set of object members of the CG from a group of CG snapshots; (iv) replicate the set of object members of the CG as a single entity; and (v) delete a CG and a nested CG according to specific semantics.

    Clustered raid data organization
    58.
    发明授权
    Clustered raid data organization 有权
    群集数据组织

    公开(公告)号:US09483349B2

    公开(公告)日:2016-11-01

    申请号:US14157828

    申请日:2014-01-17

    Applicant: NetApp, Inc.

    Abstract: In one embodiment, a node of a cluster having a plurality of nodes, executes a storage input/output (I/O) stack having a redundant array of independent disks (RAID) layer. The RAID layer organizes solid state drives (SSDs) within one or more storage arrays as a plurality of RAID groups associated with one or more extent stores. The RAID groups are formed from slices of storage spaces of the SSDs instead of entire storage spaces of the SSDs. This provides for RAID groups to co-exist on a same set of the SSDs.

    Abstract translation: 在一个实施例中,具有多个节点的集群的节点执行具有独立磁盘冗余阵列(RAID)层的存储输入/输出(I / O)堆栈。 RAID层将一个或多个存储阵列中的固态驱动器(SSD)组织为与一个或多个扩展存储区相关联的多个RAID组。 RAID组由SSD的存储空间片而不是SSD的整个存储空间形成。 这提供了RAID组在同一组SSD上共存。

    Flash optimized, log-structured layer of a file system
    59.
    发明授权
    Flash optimized, log-structured layer of a file system 有权
    闪存优化,文件系统的日志结构层

    公开(公告)号:US09448924B2

    公开(公告)日:2016-09-20

    申请号:US14150717

    申请日:2014-01-08

    Applicant: NetApp, Inc.

    Abstract: In one embodiment, storage arrays of solid state drives (SSDs) coupled to a node are organized as redundant array of independent disks (RAID) groups. Each storage array includes one or more segments. Each segment has contiguous free space on the SSDs. Data and metadata is organized on the SSDs with a sequential log-structured layout, with the data organized as variable-length extents of one or more logical units (LUNs). Segment cleaning is performed to clean a selected segment by moving the extents of the selected segment that contain valid data to one or more different segments so as to free the selected segment. Additional extents are written as a sequence of contiguous range write operations to the entire free segment with temporal locality to reduce data relocation within the SSDs as a result of the write operations.

    Abstract translation: 在一个实施例中,耦合到节点的固态驱动器(SSD)的存储阵列被组织为独立磁盘(RAID)组的冗余阵列。 每个存储阵列包括一个或多个段。 每个段在SSD上具有连续的可用空间。 数据和元数据在具有顺序日志结构布局的SSD上组织,数据组织为一个或多个逻辑单元(LUN)的可变长度范围。 执行段清洁以通过将包含有效数据的所选段的范围移动到一个或多个不同段来清理所选择的段,以便释放所选择的段。 作为连续范围写入操作的序列,附加范围被写入具有时间局部性的整个空闲段,以减少作为写入操作的结果的SSD内的数据重定位。

    HYBRID MESSAGE-BASED SCHEDULING TECHNIQUE
    60.
    发明申请
    HYBRID MESSAGE-BASED SCHEDULING TECHNIQUE 审中-公开
    基于混合信息的调度技术

    公开(公告)号:US20160246742A1

    公开(公告)日:2016-08-25

    申请号:US15051057

    申请日:2016-02-23

    Applicant: NetApp, Inc.

    CPC classification number: G06F13/36 G06F3/061 G06F3/0631 G06F3/0683

    Abstract: A hybrid message-based scheduling technique efficiently load balances a storage I/O stack partitioned into one or more non-blocking (i.e., free-running) messaging kernel (MK) threads that execute non-blocking message handlers (i.e., non-blocking services) and one or more operating system kernel blocking threads that execute blocking services. The technique combines the blocking and non-blocking services within a single coherent extended programming environment. The messaging kernel (MK) operates on processors apart from the operating system kernel that are allocated from a predetermined number of logical processors (i.e., hyper-threads) for use by an MK scheduler to schedule the non-blocking services within storage I/O stack as well as allocate a remaining number of logical processors for use by the blocking services. In addition, the technique provides a variation on a synchronization primitive that allows signaling between the two types of services (i.e., non-blocking and blocking) within the extended programming environment.

    Abstract translation: 基于混合消息的调度技术有效地将划分成一个或多个非阻塞(即,自由运行)消息传递内核(MK)线程的存储I / O栈负载平衡,执行非阻塞消息处理程序(即,非阻塞 服务)和一个或多个执行阻塞服务的操作系统内核阻塞线程。 该技术将单个连贯扩展编程环境中的阻塞和非阻塞服务相结合。 消息传送内核(MK)对从操作系统内核(从预定数量的逻辑处理器(即,超线程))分配的处理器进行操作,供MK调度程序使用以调度存储I / O内的非阻塞服务 堆栈以及分配剩余数量的逻辑处理器以供阻塞服务使用。 此外,该技术提供允许在扩展编程环境中的两种类型的服务之间的信令(即,非阻塞和阻塞)的同步原语的变体。

Patent Agency Ranking