Method of obtaining data samples from a data stream and of estimating the sortedness of the data stream based on the samples
    1.
    发明申请
    Method of obtaining data samples from a data stream and of estimating the sortedness of the data stream based on the samples 有权
    从数据流获取数据样本并基于样本估计数据流的排序的方法

    公开(公告)号:US20070244891A1

    公开(公告)日:2007-10-18

    申请号:US11405994

    申请日:2006-04-18

    IPC分类号: G06F17/30

    CPC分类号: G06F7/22 G06F17/30864

    摘要: Disclosed is a method of scanning a data stream in a single pass to obtain uniform data samples from selected intervals. The method comprises randomly selecting elements from the stream for storage in one or more data buckets and, then, randomly selecting multiple samples from the bucket(s). Each sample is associated with a specified interval immediately prior to a selected point in time. There is a balance of probabilities between the selection of elements stored in the bucket and the selection of elements included in the samples so that elements scanned during the specified interval are included in the sample with equal probability. Samples can then be used to estimate the degree of sortedness of the stream, based on counting how many elements in the sequence are the rightmost point of an interval such that majority of the interval's elements are inverted with respect to the interval's rightmost element.

    摘要翻译: 公开了一种在单次扫描中扫描数据流以从选定间隔获得均匀数据样本的方法。 该方法包括从流中随机选择元素以存储在一个或多个数据桶中,然后从桶随机选择多个样本。 每个样本在选定的时间点之前与指定的间隔相关联。 在存储在桶中的元素的选择和包含在样本中的元素的选择之间存在概率的平衡,使得在指定间隔期间扫描的元素以相等的概率被包含在样本中。 然后可以使用样本来估计流的排序程度,这是基于计数序列中的多少个元素是间隔的最右点,使得大部分间隔的元素相对于间隔的最右边的元素被反转。

    Method of obtaining data samples from a data stream and of estimating the sortedness of the data stream based on the samples
    2.
    发明授权
    Method of obtaining data samples from a data stream and of estimating the sortedness of the data stream based on the samples 有权
    从数据流获取数据样本并基于样本估计数据流的排序的方法

    公开(公告)号:US07797326B2

    公开(公告)日:2010-09-14

    申请号:US11405994

    申请日:2006-04-18

    IPC分类号: G06F7/00 G06F17/30

    CPC分类号: G06F7/22 G06F17/30864

    摘要: Disclosed is a method of scanning a data stream in a single pass to obtain uniform data samples from selected intervals. The method comprises randomly selecting elements from the stream for storage in one or more data buckets and, then, randomly selecting multiple samples from the bucket(s). Each sample is associated with a specified interval immediately prior to a selected point in time. There is a balance of probabilities between the selection of elements stored in the bucket and the selection of elements included in the samples so that elements scanned during the specified interval are included in the sample with equal probability. Samples can then be used to estimate the degree of sortedness of the stream, based on counting how many elements in the sequence are the rightmost point of an interval such that majority of the interval's elements are inverted with respect to the interval's rightmost element.

    摘要翻译: 公开了一种在单次扫描中扫描数据流以从选定间隔获得均匀数据样本的方法。 该方法包括从流中随机选择元素以存储在一个或多个数据桶中,然后从桶随机选择多个样本。 每个样本在选定的时间点之前与指定的间隔相关联。 在存储在桶中的元素的选择和包含在样本中的元素的选择之间存在概率的平衡,使得在指定间隔期间扫描的元素以相等的概率被包含在样本中。 然后可以使用样本来估计流的排序程度,这是基于计数序列中的多少个元素是间隔的最右点,使得大部分间隔的元素相对于间隔的最右边的元素被反转。

    CLOUD DATA STORAGE USING REDUNDANT ENCODING
    3.
    发明申请
    CLOUD DATA STORAGE USING REDUNDANT ENCODING 有权
    使用冗余编码的云数据存储

    公开(公告)号:US20130054549A1

    公开(公告)日:2013-02-28

    申请号:US13221928

    申请日:2011-08-31

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30557

    摘要: Cloud data storage systems, methods, and techniques partition system data symbols into predefined-sized groups and then encode each group to form corresponding parity symbols, encode all data symbols into global redundant symbols, and store each symbol (data, parity, and redundant) in different failure domains in a manner that ensures independence of failures. In several implementations, the resultant cloud-encoded data features both data locality and ability to recover up to a predefined threshold tolerance of simultaneous erasures (unavailable data symbols) without any information loss. In addition, certain implementations also feature the placement of cloud-encoded data in domains (nodes or node groups) to provide similar locality and redundancy features simultaneous with the recovery of an entire domain of data that is unavailable due to software or hardware upgrades or failures.

    摘要翻译: 云数据存储系统,方法和技术将系统数据符号划分成预定义大小的组,然后对每个组进行编码以形成对应的奇偶校验符号,将所有数据符号编码为全局冗余符号,并存储每个符号(数据,奇偶校验和冗余) 在不同的故障域中,以确保故障的独立性。 在几个实现中,由此产生的云编码数据同时具有数据局部性和恢复到同时擦除(不可用数据符号)的预定义阈值容差而无任何信息丢失的能力。 此外,某些实现还将云编码数据放置在域(节点或节点组)中,以便与恢复由于软件或硬件升级或故障而不可用的整个数据域同时提供类似的位置和冗余功能 。

    Cloud data storage using redundant encoding
    4.
    发明授权
    Cloud data storage using redundant encoding 有权
    使用冗余编码的云数据存储

    公开(公告)号:US09141679B2

    公开(公告)日:2015-09-22

    申请号:US13221928

    申请日:2011-08-31

    IPC分类号: G06F7/00 G06F17/30

    CPC分类号: G06F17/30557

    摘要: Cloud data storage systems, methods, and techniques partition system data symbols into predefined-sized groups and then encode each group to form corresponding parity symbols, encode all data symbols into global redundant symbols, and store each symbol (data, parity, and redundant) in different failure domains in a manner that ensures independence of failures. In several implementations, the resultant cloud-encoded data features both data locality and ability to recover up to a predefined threshold tolerance of simultaneous erasures (unavailable data symbols) without any information loss. In addition, certain implementations also feature the placement of cloud-encoded data in domains (nodes or node groups) to provide similar locality and redundancy features simultaneous with the recovery of an entire domain of data that is unavailable due to software or hardware upgrades or failures.

    摘要翻译: 云数据存储系统,方法和技术将系统数据符号划分成预定义大小的组,然后对每个组进行编码以形成对应的奇偶校验符号,将所有数据符号编码为全局冗余符号,并存储每个符号(数据,奇偶校验和冗余) 在不同的故障域中,以确保故障的独立性。 在几个实现中,由此产生的云编码数据同时具有数据局部性和恢复到同时擦除(不可用数据符号)的预定义阈值容差而无任何信息丢失的能力。 此外,某些实现还将云编码数据放置在域(节点或节点组)中,以便与恢复由于软件或硬件升级或故障而不可用的整个数据域同时提供类似的位置和冗余功能 。