Transactional allocation and deallocation of blocks in a block store

    公开(公告)号:US11620214B2

    公开(公告)日:2023-04-04

    申请号:US17161323

    申请日:2021-01-28

    Applicant: NUTANIX, INC.

    Abstract: Various embodiments set forth techniques for transactional allocation and deallocation of blocks in a block store. A first technique includes sending a first request that causes a non-persistent allocation of a block. The first technique also includes adding a first entry in a log recording the allocation as tentative, sending a second request that causes persistence of the allocation, and adding a second entry in a log recording the allocation as finalized. A second technique includes adding a first entry in a log recording a deallocation of a block, sending a first request that causes the deallocation of the block and causes the block to be unavailable for reallocation in a non-persistent manner, adding a second entry in the log recording that the deallocation is finalized, and sending a second request that causes the block to be made available for reallocation.

    TECHNIQUE FOR REPLICATING OPLOG INDEX AMONG NODES OF A CLUSTER

    公开(公告)号:US20220244856A1

    公开(公告)日:2022-08-04

    申请号:US17218465

    申请日:2021-03-31

    Applicant: Nutanix, Inc.

    Abstract: A technique replicates an index of an operations log (oplog) from a primary node to a secondary node of a cluster in the event of a failure of the primary node. The oplog functions as a staging area to coalesce random write operations directed to a virtual disk (vdisk) stored on a backend storage tier organized as an extent store. The oplog temporarily caches data associated with the random write operations (i.e., write data) as well as metadata describing the write data. The metadata includes descriptors to the write data corresponding to virtual address regions, i.e., offset ranges, of the vdisk and are used to identify the offset ranges of write data for the vdisk that are cached in the oplog. To facilitate fast lookup operations of the offset ranges when determining whether write data io is cached in the oplog, an oplog index provides a state of the latest data for offset ranges of the vdisk. The technique enables fast failover of metadata used to construct the oplog index in memory of a node, such as the secondary node, without downtime or significant metadata replay.

    Technique for replicating oplog index among nodes of a cluster

    公开(公告)号:US11614879B2

    公开(公告)日:2023-03-28

    申请号:US17218465

    申请日:2021-03-31

    Applicant: Nutanix, Inc.

    Abstract: A technique replicates an index of an operations log (oplog) from a primary node to a secondary node of a cluster in the event of failure. The oplog functions as a staging area to coalesce random write operations directed to a virtual disk (vdisk) stored on a backend storage tier. The oplog temporarily caches write data as well as metadata describing the write data. The metadata includes descriptors to the write data corresponding to offset ranges of the vdisk and are used to identify ranges of write data for the vdisk that are cached in the oplog. To facilitate fast lookup operations of whether write data is cached in the oplog, an oplog index provides a state of the latest data for offset ranges of the vdisk that enables fast failover of metadata used to construct the oplog index in memory without downtime or significant metadata replay.

    FREE SPACE MANAGEMENT IN A BLOCK STORE

    公开(公告)号:US20220138095A1

    公开(公告)日:2022-05-05

    申请号:US17161518

    申请日:2021-01-28

    Applicant: NUTANIX, INC.

    Abstract: Various embodiments set forth techniques for free space management in a block store. The techniques include receiving a request to allocate one or more blocks in a block store, accessing a sparse hierarchical data structure to identify an allocator page identifying a region of a backing store having a greatest number of free blocks, and allocating the one or more blocks.

    Efficient metadata management
    5.
    发明授权

    公开(公告)号:US10831521B2

    公开(公告)日:2020-11-10

    申请号:US15965656

    申请日:2018-04-27

    Applicant: Nutanix, Inc.

    Abstract: Systems for high-performance distributed computing. The systems include techniques for managing data and metadata across multiple nodes. A method embodiment commences by storing data at a node using a first storage mechanism that is local to the node. A first set of metadata is configured to identify a storage location for the stored data. The first set of metadata is stored using the same first storage mechanism that is local to the node. For accessing the first set of metadata, a second set of metadata is configured to identify a storage location for the first set of metadata. The second set of metadata is stored using a second storage mechanism that comprises a distributed metadata storage facility that stores metadata across multiple storage locations having at least one of the multiple storage locations that is not local to the node that stores data and metadata using the first storage mechanism.

    Dynamically formatted storage allocation record

    公开(公告)号:US11733894B2

    公开(公告)日:2023-08-22

    申请号:US17514221

    申请日:2021-10-29

    Applicant: NUTANIX, INC.

    CPC classification number: G06F3/064 G06F3/0604 G06F3/0631 G06F3/0673

    Abstract: One or more non-transitory computer-readable media can store program instructions that, when executed by one or more processors, cause the one or more processors to perform steps of organizing storage as a set of storage regions, each storage region having a fixed size; and for each storage region, storing a storage allocation structure of the storage region formatted in a first format selected from a format set including at least two formats, determining a change of an allocation feature of the storage region, based on the allocation feature of the storage region, selecting, from the format set, a second format of the storage allocation structure, and reformatting the storage allocation structure in the second format.

    Efficient metadata management
    7.
    发明授权

    公开(公告)号:US11734040B2

    公开(公告)日:2023-08-22

    申请号:US17093462

    申请日:2020-11-09

    Applicant: Nutanix, Inc.

    CPC classification number: G06F9/45558 G06F2009/45583 H04L67/1097

    Abstract: Systems for high-performance distributed computing. The systems include techniques for managing data and metadata across multiple nodes. A method embodiment commences by storing data at a node using a first storage mechanism that is local to the node. A first set of metadata is configured to identify a storage location for the stored data. The first set of metadata is stored using the same first storage mechanism that is local to the node. For accessing the first set of metadata, a second set of metadata is configured to identify a storage location for the first set of metadata. The second set of metadata is stored using a second storage mechanism that comprises a distributed metadata storage facility that stores metadata across multiple storage locations having at least one of the multiple storage locations that is not local to the node that stores data and metadata using the first storage mechanism.

    Common framework for kernel-assisted device polling

    公开(公告)号:US11615042B2

    公开(公告)日:2023-03-28

    申请号:US17364549

    申请日:2021-06-30

    Applicant: Nutanix, Inc.

    Abstract: This disclosure relates to high-performance computing, and more particularly to techniques for kernel-assisted device polling of user-space devices. A common kernel-based polling mechanism is provided for concurrently handling both kernel-based polling for kernel-space devices such as network interfaces (e.g., network NICs) and kernel-based polling for user-space devices such as remote direct memory access devices (e.g., RDMA NICs). Embodiments perform kernel-based polling on a first device that has a corresponding device driver in an operating system kernel. Using the same polling mechanism, the kernel-based polling is performed on a second device, the second device being a user-space device wherein the kernel-based polling on the second device is configured by creating a second device file descriptor that is not associated with a corresponding device driver in the operating system kernel. The kernel-based polling mechanism implements a single polling schedule that is applied to cover both kernel-space device events and user-space device events.

    Free space management in a block store

    公开(公告)号:US11580013B2

    公开(公告)日:2023-02-14

    申请号:US17161518

    申请日:2021-01-28

    Applicant: NUTANIX, INC.

    Abstract: Various embodiments set forth techniques for free space management in a block store. The techniques include receiving a request to allocate one or more blocks in a block store, accessing a sparse hierarchical data structure to identify an allocator page identifying a region of a backing store having a greatest number of free blocks, and allocating the one or more blocks.

Patent Agency Ranking