摘要:
Disclosed is a method of scanning a data stream in a single pass to obtain uniform data samples from selected intervals. The method comprises randomly selecting elements from the stream for storage in one or more data buckets and, then, randomly selecting multiple samples from the bucket(s). Each sample is associated with a specified interval immediately prior to a selected point in time. There is a balance of probabilities between the selection of elements stored in the bucket and the selection of elements included in the samples so that elements scanned during the specified interval are included in the sample with equal probability. Samples can then be used to estimate the degree of sortedness of the stream, based on counting how many elements in the sequence are the rightmost point of an interval such that majority of the interval's elements are inverted with respect to the interval's rightmost element.
摘要:
Disclosed is a method of scanning a data stream in a single pass to obtain uniform data samples from selected intervals. The method comprises randomly selecting elements from the stream for storage in one or more data buckets and, then, randomly selecting multiple samples from the bucket(s). Each sample is associated with a specified interval immediately prior to a selected point in time. There is a balance of probabilities between the selection of elements stored in the bucket and the selection of elements included in the samples so that elements scanned during the specified interval are included in the sample with equal probability. Samples can then be used to estimate the degree of sortedness of the stream, based on counting how many elements in the sequence are the rightmost point of an interval such that majority of the interval's elements are inverted with respect to the interval's rightmost element.
摘要:
Cloud data storage systems, methods, and techniques partition system data symbols into predefined-sized groups and then encode each group to form corresponding parity symbols, encode all data symbols into global redundant symbols, and store each symbol (data, parity, and redundant) in different failure domains in a manner that ensures independence of failures. In several implementations, the resultant cloud-encoded data features both data locality and ability to recover up to a predefined threshold tolerance of simultaneous erasures (unavailable data symbols) without any information loss. In addition, certain implementations also feature the placement of cloud-encoded data in domains (nodes or node groups) to provide similar locality and redundancy features simultaneous with the recovery of an entire domain of data that is unavailable due to software or hardware upgrades or failures.
摘要:
Cloud data storage systems, methods, and techniques partition system data symbols into predefined-sized groups and then encode each group to form corresponding parity symbols, encode all data symbols into global redundant symbols, and store each symbol (data, parity, and redundant) in different failure domains in a manner that ensures independence of failures. In several implementations, the resultant cloud-encoded data features both data locality and ability to recover up to a predefined threshold tolerance of simultaneous erasures (unavailable data symbols) without any information loss. In addition, certain implementations also feature the placement of cloud-encoded data in domains (nodes or node groups) to provide similar locality and redundancy features simultaneous with the recovery of an entire domain of data that is unavailable due to software or hardware upgrades or failures.