Abstract:
A sorted key-value store is implemented using a write-back cache maintained in memory, a B-tree data structured maintained in disk, and a logical and physical log for providing transactions. The logical log and write-back cache are used to answer client requests, while dirty blocks in the write-back cache are periodically flushed to disk using the physical log.
Abstract:
System and method for managing storage metadata utilize a metadata data structure containing allocation information of storage blocks of a storage system in which a portion of the metadata data structure that corresponds to a group of the storage blocks can be reserved to a requesting client, which then manages the portion of the metadata data structure using a copy of the portion of the metadata data structure.
Abstract:
A distributed computing application is described that provides a highly elastic and multi-tenant platform for Hadoop applications and other workloads running in a virtualized environment. Production, test, and development deployments of a Hadoop application may be executed using multiple compute clusters and a shared instance of a distributed filesystem, or in other cases, multiple instances of the distributed filesystem. Data nodes executing as virtual machines (VMs) for test and development deployments can be linked clones of data nodes executing as VMs for a production deployment to reduce duplicated data and provide a shared storage space.
Abstract:
A virtual file system and method for performing virtual file system operations uses a vnode descriptor to access a vnode for a vnode operation. If the vnode is not found in a vnode cache using the vnode descriptor, the vnode is reconstructed using information regarding the vnode found outside of the vnode cache using the vnode descriptor.
Abstract:
System and method for accessing a distributed storage system uses a storage-level access control process at a distributed file system that interfaces with the distributed storage system to determine whether a particular client has access to a particular first file system object using an identifier of the particular client and storage-level access control rules in response to a file system request from the particular client to access a second file system object in the particular first file system. The storage-level access control rules are defined for a plurality of clients and a plurality of first file system objects of the distributed storage system to allow the particular client access to the second file system object in the particular first file system object only if the particular client has been determined to have access to the particular first file system object according to the storage-level access control rules.
Abstract:
A file system uses a B-tree data structure to organize file data. The file system may maintain an index node (mode) representing a file and having entries that map to extents of the file. When the file system detects an index node, through updates, has exceeded a threshold number of extents, the file system converts the file to a copy-on-write (COW) B-tree data structure containing the entries representing the extents of the file. To clone the file, the file system uses copies of the index node and the root node of the COW B-tree data structure.
Abstract:
System and method for managing storage metadata utilize a metadata data structure containing allocation information of storage blocks of a storage system in which a portion of the metadata data structure that corresponds to a group of the storage blocks can be reserved to a requesting client, which then manages the portion of the metadata data structure using a copy of the portion of the metadata data structure.
Abstract:
A virtualized computing system for executing a distributed computing application, such as Hadoop, is discussed. The virtualized computing system stores data in a distributed filesystem, such as Hadoop Distributed File System, and processes data using a topology awareness that takes into account the virtualization layer of the virtualized computing system. The virtualized computing system employs locality-related policies, including replica placement policies, replica choosing policies, balancer policies, and task scheduling policies that take advantage of the awareness of the virtualization topology.
Abstract:
System and method for accessing a distributed storage system uses a storage-level access control process at a distributed file system that interfaces with the distributed storage system to determine whether a particular client has access to a particular first file system object using an identifier of the particular client and storage-level access control rules in response to a file system request from the particular client to access a second file system object in the particular first file system. The storage-level access control rules are defined for a plurality of clients and a plurality of first file system objects of the distributed storage system to allow the particular client access to the second file system object in the particular first file system object only if the particular client has been determined to have access to the particular first file system object according to the storage-level access control rules.
Abstract:
Exemplary methods, apparatuses, and systems include a controller node receiving a request to perform a consistency check of a distributed file system. The controller node transmits, to each of a plurality of nodes, a request for the node to use logical metadata of the distributed file system owned by the node to construct an expected copy of physical metadata mapped to the logical metadata, determine which of the plurality of nodes own actual portions of the physical metadata, transmit corresponding portions of the expected copy of the physical metadata to each of the nodes determined to own actual portions of the physical metadata, and compare expected copies of the physical metadata received from other nodes to the actual physical metadata owned by the node. The controller node receives a result of the comparison from each of the nodes, aggregates the received results, and generates an error report.