摘要:
An archival storage cluster of preferably symmetric nodes includes a metadata management system that organizes and provides access to given metadata, preferably in the form of metadata objects. Each metadata object may have a unique name, and metadata objects are organized into regions. Preferably, a region is selected by hashing one or more object attributes (e.g., the object's name) and extracting a given number of bits of the resulting hash value. The number of bits may be controlled by a configuration parameter. Each region is stored redundantly. A region comprises a set of region copies. In particular, there is one authoritative copy of the region, and zero or more backup copies. The number of backup copies may be controlled by a configuration parameter. Region copies are distributed across the nodes of the cluster so as to balance the number of authoritative region copies per node, as well as the number of total region copies per node. Backup region copies are maintained synchronized to their associated authoritative region copy.
摘要:
An archival storage cluster of preferably symmetric nodes includes a data protection management system that periodically organizes the then-available nodes into one or more protection sets, with each set comprising a set of n nodes, where “n” refers to a configurable “data protection level” (DPL). At the time of its creation, a given protection set is closed in the sense that each then available node is a member of one, and only one, protection set. When an object is to be stored within the archive, the data protection management system stores the object in a given node of a given protection set and then constrains the distribution of copies of that object to other nodes within the given protection set. As a consequence, all DPL copies of an object are all stored within the same protection set, and only that protection set. This scheme significantly improves MTDL for the cluster as a whole, as the data can only be lost if multiple failures occur within nodes of a given protection set. This is far more unlikely than failures occurring across any random distribution of nodes within the cluster.
摘要:
An archive cluster application runs across a redundant array of independent nodes. Each node runs an archive cluster application instance comprising a set of software processes: a request manager, a storage manager, a metadata manager, and a policy manager. The request manager manages requests for data, the storage manager manages data read/write functions, and the metadata manager facilitates metadata transactions and recovery. The policy manager implements policies, which are operations that determine the behavior of an “archive object” within the cluster. The archive cluster application provides object-based storage. It associates metadata and policies with the raw archived data, which together comprise an archive object. Object policies govern the object's behavior in the archive. The archive manages itself independently of client applications, acting automatically to ensure that object policies are valid.
摘要:
An archive cluster application runs in a distributed manner across a redundant array of independent nodes. Each node preferably runs a complete archive cluster application instance. A given nodes provides a data repository, which stores up to a large amount (e.g., a terabyte) of data, while also acting as a portal that enables access to archive files. Each symmetric node has a set of software processes, e.g., a request manager, a storage manager, a metadata manager, and a policy manager. The request manager manages requests to the node for data (i.e., file data), the storage manager manages data read/write functions from a disk associated with the node, and the metadata manager facilitates metadata transactions and recovery across the distributed database. The policy manager implements one or more policies, which are operations that determine the behavior of an “archive object” within the cluster. The archive cluster application provides object-based storage. Preferably, the application permanently associates metadata and policies with the raw archived data, which together comprise an archive object. Object policies govern the object's behavior in the archive. As a result, the archive manages itself independently of client applications, acting automatically to ensure that all object policies are valid.
摘要:
A generic testing framework to automatically allocate, install and verify a given version of a system under test, to exercise the system against a series of tests in a “hands-off” objective manner, and then to export information about the tests to one or more developer repositories (such as a query-able database, an email list, a developer web server, a source code version control system, a defect tracking system, or the like). The framework does not “care” or concern itself with the particular implementation language of the test as long as the test can issue directives via a command line or configuration file. During the automated testing of a given test suite having multiple tests, and after a particular test is run, the framework preferably generates an “image” of the system under test and makes that information available to developers, even while additional tests in the suite are being carried out. In this manner, the framework preserves the system “state” to facilitate concurrent or after-the-fact debugging. The framework also will re-install and verify a given version of the system between tests, which may be necessary in the event a given test is destructive or otherwise places the system in an unacceptable condition.
摘要:
An archive cluster application runs in a distributed manner across a redundant array of independent nodes. Each node preferably runs a complete archive cluster application instance. A given nodes provides a data repository, which stores up to a large amount (e.g., a terabyte) of data, while also acting as a portal that enables access to archive files. Each symmetric node has a set of software processes, e.g., a request manager, a storage manager, a metadata manager, and a policy manager. The request manager manages requests to the node for data (i.e., file data), the storage manager manages data read/write functions from a disk associated with the node, and the metadata manager facilitates metadata transactions and recovery across the distributed database. The policy manager implements one or more policies, which are operations that determine the behavior of an “archive object” within the cluster. The archive cluster application provides object-based storage. Preferably, the application permanently associates metadata and policies with the raw archived data, which together comprise an archive object. Object policies govern the object's behavior in the archive. As a result, the archive manages itself independently of client applications, acting automatically to ensure that all object policies are valid.
摘要:
A file protection scheme for fixed content in a distributed data archive uses computations that leverage permutation operators of a cyclic code. In an illustrative embodiment, an N+K coding technique is described for use to protect data that is being distributed in a redundant array of independent nodes (RAIN). The data itself may be of any type, and it may also include system metadata. According to the invention, the data to be distributed is encoded by a dispersal operation that uses a group of permutation ring operators. In a preferred embodiment, the dispersal operation is carried out using a matrix of the form [IN—C] where IN is an n×n identity sub-matrix and C is a k×n sub-matrix of code blocks. The identity sub-matrix is used to preserve the data blocks intact. The sub-matrix C preferably comprises a set of permutation ring operators that are used to generate the code blocks. The operators are preferably superpositions that are selected from a group ring of a permutation group with base ring Z2.
摘要:
A generic testing framework to automatically allocate, install and verify a given version of a system under test, to exercise the system against a series of tests in a “hands-off” objective manner, and then to export information about the tests to one or more developer repositories (such as a query-able database, an email list, a developer web server, a source code version control system, a defect tracking system, or the like). The framework does not “care” or concern itself with the particular implementation language of the test as long as the test can issue directives via a command line or configuration file. During the automated testing of a given test suite having multiple tests, and after a particular test is run, the framework preferably generates an “image” of the system under test and makes that information available to developers, even while additional tests in the suite are being carried out. In this manner, the framework preserves the system “state” to facilitate concurrent or after-the-fact debugging. The framework also will re-install and verify a given version of the system between tests, which may be necessary in the event a given test is destructive or otherwise places the system in an unacceptable condition.
摘要:
An automated system that randomly generates test cases for use in hardware or software quality assurance testing, wherein a given test case comprises a sequence (or “chain”) of discrete, atomic steps (or “building blocks”). A particular test case (i.e., a given sequence) has a variable number of building blocks. The system takes a set of test actions (or even test cases) and links them together in a relevant and useful manner to create a much larger library of test cases or “chains.” The chains comprise a large number of random sequence tests that facilitate “chaos-like” or exploratory testing of the overall system under test. Upon execution in the system under test, the test case is considered successful (i.e., a pass) if each building block in the chain executes successfully; if any building block fails, the test case, in its entirety, is considered a failure. The system adapts and dynamically generates new test cases as underlying data changes (e.g., a building block is added, deleted, modified) or new test cases themselves are generated. The system also is tunable to generate test sequences that have a given (e.g., higher) likelihood of finding bugs or generating errors from which the testing entity can then assess the system operation. Generated chains can be replayed easily to provide test reproducibility.
摘要:
An archive cluster application runs in a distributed manner across a redundant array of independent nodes. Each node preferably runs a complete archive cluster application instance. A given nodes provides a data repository, which stores up to a large amount (e.g., a terabyte) of data, while also acting as a portal that enables access to archive files. Each symmetric node has a set of software processes, e.g., a request manager, a storage manager, a metadata manager, and a policy manager. The request manager manages requests to the node for data (i.e., file data), the storage manager manages data read/write functions from a disk associated with the node, and the metadata manager facilitates metadata transactions and recovery across the distributed database. The policy manager implements one or more policies, which are operations that determine the behavior of an “archive object” within the cluster. The archive cluster application provides object-based storage. Preferably, the application permanently associates metadata and policies with the raw archived data, which together comprise an archive object. Object policies govern the object's behavior in the archive. As a result, the archive manages itself independently of client applications, acting automatically to ensure that all object policies are valid.