Abstract:
Systems for high-performance computing. A storage control architecture is implemented by a plurality of nodes, where a node comprises combinations of executable containers that execute in cooperation with virtual machines running above a hypervisor. The containers run in a virtual machine above a hypervisor, and/or can be integrated directly into the operating system of a host node. Sensitive information such as credit card information may be isolated from the containers in a separate virtual machine that is configured to be threat resistant, and which can be accessed through a threat resistant interface module. One of the virtual machines of the node may be a node-specific control virtual machine that is configured to operate as a dedicated storage controller for a node. One of the virtual machines of the node may be a node-specific container service machine that is configured to provide storage-related and other support to a hosted executable container.
Abstract:
Systems for cluster computing. A method for detection and remediation of degraded nodes in a cluster commences upon measuring operational aspects of the nodes in the cluster, then determining, based on the measurements and other factors, a suspect set of nodes comprising one or more suspect nodes from the nodes in the cluster that have measurements that are determined to be outliers with respect to remaining nodes that are determined not to be the outliers. A density-based spatial clustering analysis is performed over the suspect set and remediation actions are initiated when results of the density-based spatial clustering analysis identifies a suspect node as being a degraded node.
Abstract:
An approach for reduced size extent identifiers for a file system may be implemented by generating a full-size extent or file identifier and generating a smaller identifier from a portion of the full-size identifier. A check may be performed as to whether the smaller identifier is unique within a file system and if it is unique, the smaller identifier may be used in place of the full-size identifier. If not unique, the size of the smaller identifier may be increased. In some embodiments, the size of the smaller identifier is increased until a unique identifier if found.
Abstract:
Systems and methods for high availability computing systems. Systems and methods include disaster recovery of two-node computing clusters. A method embodiment commences upon identifying a computing cluster having two nodes, the two nodes corresponding to a first node and a second node that each send and receive heartbeat indications periodically while performing storage I/O operations. One or both of the two nodes detect a heartbeat failure between the two nodes, and in response to detecting the heartbeat failure, one or both of the nodes temporarily cease storage I/O operations. A witness node is accessed in an on-demand basis as a result of detecting the heartbeat failure. The witness performs a leadership election operation to provide a leadership lock to only one requestor. The leader then resumes storage I/O operations and performs one or more disaster remediation operations. After remediation, the computing cluster is restored to a configuration having two nodes.
Abstract:
Systems for low-latency data access in distributed computing systems. A method embodiment commences upon generating a first storage area in local storage of a first computing node. Access to the first storage area is provided through the first computing node. A second storage area is generated wherein the second storage area comprises a first set of metadata that comprises local storage device locations of at least some of the local storage areas of the first storage area. A set of physical access locations of the second storage area is stored to a database that manages updates to the second set of metadata pertaining to the second storage area. Accesses to the first storage area are accomplished by querying the database to retrieve a location of the second set of metadata, and then accessing the first storage area through one or more additional levels of metadata that are node-wise collocated.
Abstract:
Systems for low-latency data access in distributed computing systems. A method embodiment commences upon generating a first storage area in local storage of a first computing node. Access to the first storage area is provided through the first computing node. A second storage area is generated wherein the second storage area comprises a first set of metadata that comprises local storage device locations of at least some of the local storage areas of the first storage area. A set of physical access locations of the second storage area is stored to a database that manages updates to the second set of metadata pertaining to the second storage area. Accesses to the first storage area are accomplished by querying the database retrieve a location of the second set of metadata, and then accessing the first storage area through one or more additional levels of metadata that are node-wise collocated.
Abstract:
A method and apparatus for data driven and cluster specific version/update control. The apparatus includes an automated multi-clusters management apparatus that interfaces with a plurality of remote clusters to provide data driven version/update control on a cluster by cluster basis. Generally, operation includes collection/identification of cluster specific data pertaining to software, hardware, and cluster requirements. The cluster specific data is later compared/analyzed against multi-cluster data pertaining to software releases, hardware characteristics, and known bugs/issues for each. The results of the comparison/analysis can then be ranked according to various metrics to different possible solutions and to differentiate the less desirable results from the more desirable results. Thus, the automated multi-cluster management apparatus provides for selection of versions/updates that is dependent on the cluster specific data. Additionally, the present disclosure provides for scheduling and distribution planning for selected versions/updates.
Abstract:
Embodiments serve to balance overall performance of a finite-sized caching system having a first cache of a first cache size and a second cache of a second cache size. A tail portion and a head portion of each of the caches are defined wherein incoming data elements are initially stored in a respective head portion and wherein evicted data elements are evicted from a respective tail portion. Performance metrics are defined wherein a performance metric includes a predicted miss cost that would be incurred when replacing an evicted data elements. A quantitative function is defined to include cache performance metrics and a cache reallocation amount. The cache performance metrics are evaluated periodically to determine a then-current cache reallocation amount. The caches can be balanced by increasing the first cache size by the cache reallocation amount and decreasing the second cache size by the cache reallocation amount.
Abstract:
Systems for high-performance computing. A storage control architecture is implemented by a plurality of nodes, where a node comprises combinations of executable containers that execute in cooperation with virtual machines running above a hypervisor. The containers run in a virtual machine above a hypervisor, and/or can be integrated directly into the operating system of a host node. Sensitive information such as credit card information may be isolated from the containers in a separate virtual machine that is configured to be threat resistant, and which can be accessed through a threat resistant interface module. One of the virtual machines of the node may be a node-specific control virtual machine that is configured to operate as a dedicated storage controller for a node. One of the virtual machines of the node may be a node-specific container service machine that is configured to provide storage-related and other support to a hosted executable container.
Abstract:
Various embodiments set forth techniques for managing and/or accessing metadata associated with a vblock, systems implementing said techniques, and computer-readable media storing instructions for performing said techniques. In some embodiments, one or more computer-readable media store instructions that, when executed by one or more processors, cause the one or more processors to perform steps including receiving a request for metadata associated with a vblock; accessing a merged metadata record associated with the vblock, where the merged metadata record comprises metadata corresponding to metadata in metadata records for all but a last snapshot or a live vblock having a metadata record, and a first identifier of the last snapshot or the live vblock having a metadata record; and returning the requested metadata based on the metadata in the merged metadata record and metadata in the metadata record identified by the first identifier.