发明申请
- 专利标题: Method of obtaining data samples from a data stream and of estimating the sortedness of the data stream based on the samples
- 专利标题(中): 从数据流获取数据样本并基于样本估计数据流的排序的方法
-
申请号: US11405994申请日: 2006-04-18
-
公开(公告)号: US20070244891A1公开(公告)日: 2007-10-18
- 发明人: Parikshit Gopalan , Robert Krauthgamer , Jayram Thathachar
- 申请人: Parikshit Gopalan , Robert Krauthgamer , Jayram Thathachar
- 申请人地址: US NY Armonk
- 专利权人: International Business Machines Corporation
- 当前专利权人: International Business Machines Corporation
- 当前专利权人地址: US NY Armonk
- 主分类号: G06F17/30
- IPC分类号: G06F17/30
摘要:
Disclosed is a method of scanning a data stream in a single pass to obtain uniform data samples from selected intervals. The method comprises randomly selecting elements from the stream for storage in one or more data buckets and, then, randomly selecting multiple samples from the bucket(s). Each sample is associated with a specified interval immediately prior to a selected point in time. There is a balance of probabilities between the selection of elements stored in the bucket and the selection of elements included in the samples so that elements scanned during the specified interval are included in the sample with equal probability. Samples can then be used to estimate the degree of sortedness of the stream, based on counting how many elements in the sequence are the rightmost point of an interval such that majority of the interval's elements are inverted with respect to the interval's rightmost element.
公开/授权文献
信息查询