发明申请
- 专利标题: Method for summarizing data in unaggregated data streams
- 专利标题(中): 用于汇总未分组数据流中的数据的方法
-
申请号: US12653831申请日: 2009-12-18
-
公开(公告)号: US20110153554A1公开(公告)日: 2011-06-23
- 发明人: Edith Cohen , Nicholas Duffield , Haim Kaplan , Carsten Lund , Mikkel Thorup
- 申请人: Edith Cohen , Nicholas Duffield , Haim Kaplan , Carsten Lund , Mikkel Thorup
- 专利权人: AT&T Intellectual Property I. L.P.
- 当前专利权人: AT&T Intellectual Property I. L.P.
- 主分类号: G06F17/30
- IPC分类号: G06F17/30
摘要:
A method for producing a summary A of data points in an unaggregated data stream wherein the data points are in the form of weighted keys (a, w) where a is a key and w is a weight, and the summary is a sample of k keys a with adjusted weights wa. A first reservoir L includes keys having adjusted weights which are additions of weights of individual data points of included keys and a second reservoir T includes keys having adjusted weights which are each equal to a threshold value τ whose value is adjusted based upon tests of new data points arriving in the data stream. The summary combines the keys and adjusted weights of the first reservoir L with the keys and adjusted weights of the second reservoir T to form the sample representing the data stream upon which further analysis may be performed. The method proceeds by first merging new data points in the stream into the reservoir L until the reservoir contains k different keys and thereafter applying a series of tests to new arriving data points to determine what keys and weights are to be added to or removed the reservoirs L and T to provide a summary with a variance that approaches the minimum possible for aggregated data sets. The method is composable, can be applied to high speed data streams such as those found on the Internet, and can be implemented efficiently.
公开/授权文献
- US08195710B2 Method for summarizing data in unaggregated data streams 公开/授权日:2012-06-05
信息查询