Distributed File System with Reduced Write and Read Latencies

    公开(公告)号:US20250068600A1

    公开(公告)日:2025-02-27

    申请号:US18942186

    申请日:2024-11-08

    Applicant: NetApp, Inc.

    Abstract: A method for reducing write latency in a distributed file system. A write request that includes a volume identifier is received at a data management subsystem deployed on a node within a distributed storage system. The data management subsystem maps the volume identifier to a file system volume and maps the file system volume to a set of logical block addresses in a logical block device in a storage management subsystem deployed on the node. The storage management subsystem maps the logical block device to a metadata object for the logical block device on the node that is used to process the write request. The mapping of the file system volume to the set of logical block addresses in the logical block device enables co-locating the metadata object with the file system volume on the node, which reduces the write latency associated with processing the write request.

    FILE SYSTEM FORMAT FOR PERSISTENT MEMORY

    公开(公告)号:US20250038756A1

    公开(公告)日:2025-01-30

    申请号:US18914423

    申请日:2024-10-14

    Applicant: NetApp, Inc.

    Abstract: Techniques are provided for implementing a file system format for persistent memory. A node, comprising persistent memory, receives an operation comprising a file identifier and file system instance information. A list of file system info objects are evaluated to identify a file system info object matching the file system instance information. An inofile, identified by the file system info object as being associated with inodes of files within an instance of the file system targeted by the operation, is traversed to identify an inode matching the file identifier. If the inode comprises an indicator that the file is tiered into the persistent memory, then the inode it utilized to facilitate execution of the operation upon the persistent memory. Otherwise, the operation is routed to a storage file system tier for execution by a storage file system upon storage associated with the node.

    Multi-tier write allocation
    33.
    发明授权

    公开(公告)号:US12124716B2

    公开(公告)日:2024-10-22

    申请号:US18357206

    申请日:2023-07-24

    Applicant: NetApp Inc.

    CPC classification number: G06F3/0631 G06F3/061 G06F3/0665 G06F3/067

    Abstract: Techniques are provided for multi-tier write allocation. A storage system may store data within a multi-tier storage environment comprising a first storage tier (e.g., storage devices maintained by the storage system), a second storage tier (e.g., a remote object store provided by a third party storage provider), and/or other storage tiers. A determination is made that data (e.g., data of a write request received by the storage system) is to be stored within the second storage tier. The data is stored into a staging area of the first storage tier. A second storage tier location identifier, for referencing the data according to a format utilized by the second storage tier, is assigned to the data and provided to a file system hosting the data. The data is then destaged from the staging area into the second storage tier, such as within an object stored within the remote object store.

    Slice file recovery using dead replica slice files

    公开(公告)号:US12014056B2

    公开(公告)日:2024-06-18

    申请号:US17893511

    申请日:2022-08-23

    Applicant: NetApp Inc.

    CPC classification number: G06F3/0619 G06F3/064 G06F3/067

    Abstract: Techniques are provided for repairing a primary slice file, affected by a storage device error, by using one or more dead replica slice files. The primary slice file is used by a node of a distributed storage architecture as an indirection layer between storage containers (e.g., a volume or LUN) and physical storage where data is physically stored. To improve resiliency of the distributed storage architecture, changes to the primary slice file are replicated to replica slice files hosted by other nodes. If a replica slice file falls out of sync with the primary slice file, then the replica slice file is considered dead (out of sync) and could potentially comprise stale data. If a storage device error affects blocks storing data of the primary slice file, then the techniques provided herein can repair the primary slice file using non-stale data from one or more dead replica slice files.

    BLOCK ALLOCATION FOR PERSISTENT MEMORY DURING AGGREGATE TRANSITION

    公开(公告)号:US20240103744A1

    公开(公告)日:2024-03-28

    申请号:US18528556

    申请日:2023-12-04

    Applicant: NetApp Inc.

    CPC classification number: G06F3/0631 G06F3/0604 G06F3/064 G06F3/065 G06F3/0679

    Abstract: Techniques are provided for block allocation for persistent memory during aggregate transition. In a high availability pair including first and second nodes, the first node makes a determination that control of a first aggregate is to transition from the first node to the second node. A portion of available free storage space is allocated from a first persistent memory of the first node as allocated pages within the first persistent memory. Metadata information for the allocated pages is updated with an identifier of the first aggregate to create updated metadata information reserving the allocated pages for the first aggregate. The updated metadata information is mirrored to the second node, so that the second node also reserves those pages. Control of the first aggregate is transitioned to the second node. As a result, the nodes do not attempt allocating the same free pages to different aggregates during a transition.

    Co-located Journaling and Data Storage for Write Requests

    公开(公告)号:US20240061603A1

    公开(公告)日:2024-02-22

    申请号:US18497925

    申请日:2023-10-30

    Applicant: NetApp, Inc.

    Abstract: Methods and systems for co-locating journaling and data storage are provided. Separate journal and volume partitions may be maintained within each logical storage unit (e.g., Logical Unit Number (LUN)) of a distributed storage system. Journaling of metadata associated with write requests received from one or more clients may be distributed by identifying a destination logical storage unit to which data associated with a given write request is to be stored and causing the data and metadata to be persisted to disk by journaling the metadata and the data to respective portions of an active log within the journal partition of the destination logical storage unit. By using the same logical storage unit for both journaling of write requests and writing the data associated with such write requests, the bottleneck due to there being only a single device or storage unit handling all metadata for all write requests can be avoided.

    Journal replay optimization
    40.
    发明授权

    公开(公告)号:US11861198B2

    公开(公告)日:2024-01-02

    申请号:US17728441

    申请日:2022-04-25

    Applicant: NetApp Inc.

    CPC classification number: G06F3/064 G06F3/067 G06F3/0619 G06F3/0656 G06F3/0659

    Abstract: Techniques are provided for journal replay optimization. A distributed storage architecture can implement a journal within memory for logging write operations into log records. Latency of executing the write operations is improved because the write operations can be responded back to clients as complete once logged within the journal without having to store the data to higher latency disk storage. If there is a failure, then a replay process is performed to replay the write operations logged within the journal in order to bring a file system up-to-date. The time to complete the replay of the write operations is significantly reduced by caching metadata (e.g., indirect blocks, checksums, buftree identifiers, file block numbers, and consistency point counts) directly into log records. Replay can quickly access this metadata for replaying the write operations because the metadata does not need to be retrieved from the higher latency disk storage into memory.

Patent Agency Ranking