摘要:
Systems and methods for data management and data processing are provided. Embodiments may include systems and methods relating to fast data selection with reasonably high quality results, and may include a faster data selection function and a slower data selection function. Various embodiments may include systems and methods relating to data hashing and/or data redundancy identification and elimination for a data set or a string of data. Embodiments may include a first selection function is used to pre-select boundary points or data blocks/windows from a data set or data stream and a second selection function is used to refine the boundary points or data blocks/windows. The second selection function may be better at determining the best places for boundary points or data blocks/windows in the data set or data stream. In various embodiments, data may be processed by a first faster hash function and slower more discriminating second hash function.
摘要:
Data is stored in a distributed data storage system comprising a plurality of disks. When a disk fails, system reliability is restored by executing a set of reconstructions according to a schedule. System reliability is characterized by a dynamic Normalcy Deviation Score. The schedule for executing the set of reconstructions is determined by a minimum intersection policy. A set of reconstructions is received and divided into a set of queues rank-ordered by redundancy level ranging from a lowest redundancy level to a highest redundancy level. For reconstructions in each queue, an intersection matrix is calculated. Diskscores for each disk are calculated. The schedule for the set of reconstructions is based at least in part on the intersection matrices, the Normal Deviation Scores, and the diskscores.
摘要:
Information, such as files received from a client, etc. is stored in a storage system, such as a content addressable storage system. A file server receives data from a client and chunks the data into blocks of data. The file server also generates metadata for use in forming a data structure. The blocks of data are stored in a block store and a copy of the data blocks and the metadata are locally cached at the file server. A commit server retrieves the metadata. In at least one embodiment, the metadata is retrieved from an update log shared between the file server and the commit server. Based on the retrieved metadata, the commit server generates a version of a data structure. The data structure is then stored at the block store.
摘要:
System(s) and method(s) are provided for data management and data processing. For example, various embodiments may include systems and methods relating to relatively larger groups of data being selected with comparable or better performing selection results (e.g. high data redundancy elimination and/or average chunk size). In various embodiments, the system(s) and method(s) may include, for example a data group, block, or chunk combining technique and/or a data group, block, or chunk splitting technique. Various embodiments may include a first standard or typical data grouping, blocking, or chunking technique and/or data group, block or chunk combining technique and/or a data group, block, or chunk splitting technique. Exemplary system(s) and method(s) may relate to data hashing and/or data elimination. Embodiments may include a look-ahead buffer and determine whether to emit small chunks or large chunks based on characteristics of underlying data and/or particular application of the invention (e.g. for backup).