Methods and apparatus for representing probabilistic data using a probabilistic histogram
    1.
    发明授权
    Methods and apparatus for representing probabilistic data using a probabilistic histogram 失效
    使用概率直方图表示概率数据的方法和装置

    公开(公告)号:US08145669B2

    公开(公告)日:2012-03-27

    申请号:US12636544

    申请日:2009-12-11

    IPC分类号: G06F7/00 G06F17/30

    CPC分类号: G06F17/30536

    摘要: Methods and apparatus for representing probabilistic data using a probabilistic histogram are disclosed. An example method comprises partitioning a plurality of ordered data items into a plurality of buckets, each of the data items capable of having a data value from a plurality of possible data values with a probability characterized by a respective individual probability distribution function (PDF), each bucket associated with a respective subset of the ordered data items bounded by a respective beginning data item and a respective ending data item, and determining a first representative PDF for a first bucket associated with a first subset of the ordered data items by partitioning the plurality of possible data values into a first plurality of representative data ranges and respective representative probabilities based on an error between the first representative PDF and a first plurality of individual PDFs characterizing the first subset of the ordered data items.

    摘要翻译: 公开了使用概率直方图表示概率数据的方法和装置。 一种示例性方法包括将多个有序数据项划分成多个桶,每个数据项能够具有来自多个可能数据值的数据值,其特征在于各自的概率分布函数(PDF), 每个桶与由相应的开始数据项和相应的结束数据项限定的有序数据项的相应子集相关联,并且通过分割多个数据项来确定与有序数据项的第一子集相关联的第一个桶的第一代表性PDF 基于第一代表性PDF和表征有序数据项的第一子集的第一多个单独PDF之间的误差,将可能的数据值转换成第一多个代表性数据范围和相应的代表概率。

    Method for distributed tracking of approximate join size and related summaries
    2.
    发明申请
    Method for distributed tracking of approximate join size and related summaries 有权
    分布式跟踪连接大小和相关摘要的方法

    公开(公告)号:US20070240061A1

    公开(公告)日:2007-10-11

    申请号:US11392440

    申请日:2006-03-29

    IPC分类号: G06F15/173 G06F15/177

    摘要: A method of distributed approximate query tracking relies on tracking general-purpose randomized sketch summaries of local streams at remote sites along with concise prediction models of local site behavior in order to produce highly communication-efficient and space/time-efficient solutions. A powerful approximate query tracking framework readily incorporates several complex analysis queries, including distributed join and multi-join aggregates and approximate wavelet representations, thus giving the first known low-overhead tracking solution for such queries in the distributed-streams model.

    摘要翻译: 分布式近似查询跟踪的方法依赖于跟踪远程站点的本地流的通用随机草图摘要以及本地站点行为的简洁预测模型,以生成高通信效率和空间/时间效率的解决方案。 强大的近似查询跟踪框架容易地并入了多个复杂的分析查询,包括分布式连接和多连接聚合以及近似小波表示,从而为分布式流模型中的这种查询提供了第一个已知的低开销跟踪解决方案。

    Fast approximate wavelet tracking on streams
    3.
    发明申请
    Fast approximate wavelet tracking on streams 有权
    在流上快速近似小波跟踪

    公开(公告)号:US20070237410A1

    公开(公告)日:2007-10-11

    申请号:US11389040

    申请日:2006-03-24

    IPC分类号: G06K9/46 G06F17/30

    CPC分类号: G06K9/00516

    摘要: The first fast solution to the problem of tracking wavelet representations of one-dimensional and multi-dimensional data streams based on a stream synopsis, the Group-Count Sketch (GCS) is provided. By imposing a hierarchical structure of groups over the data and applying the GCS, our algorithms can quickly recover the most important wavelet coefficients with guaranteed accuracy. A tradeoff between query time and update time is established, by varying the hierarchical structure of groups, allowing the right balance to be found for specific data streams. Experimental analysis confirmed this tradeoff, and showed that all the methods significantly outperformed previously known methods in terms of both update time and query time, while maintaining a high level of accuracy.

    摘要翻译: 提供了基于流概要的一维和多维数据流的小波表示的问题的第一个快速解决方案,提供了组计数草图(GCS)。 通过在数据上施加组的层次结构并应用GCS,我们的算法可以保证精度快速恢复最重要的小波系数。 通过改变组的层次结构,建立查询时间和更新时间之间的折衷,从而为特定的数据流找到适当的平衡。 实验分析证实了这种权衡,并且表明所有方法在更新时间和查询时间方面都显着优于先前已知的方法,同时保持高水准的准确性。

    METHODS AND APPARATUS TO CONSTRUCT HISTOGRAM AND WAVELET SYNOPSES FOR PROBABILISTIC DATA
    4.
    发明申请
    METHODS AND APPARATUS TO CONSTRUCT HISTOGRAM AND WAVELET SYNOPSES FOR PROBABILISTIC DATA 有权
    构建用于概率数据的组织和小波综合的方法和装置

    公开(公告)号:US20100153328A1

    公开(公告)日:2010-06-17

    申请号:US12334264

    申请日:2008-12-12

    IPC分类号: G06N5/02

    摘要: Example methods and apparatus to construct histogram and wavelet synopses for probabilistic data are disclosed. A disclosed example method involves receiving probabilistic data associated with probability measures and generating a plurality of histograms based on the probabilistic data. Each histogram is generated based on items represented by the probabilistic data. In addition, each histogram is generated using a different quantity of buckets containing different ones of the items. An error measure associated with each of the plurality of histograms is determined and one of the plurality of histograms is selected based on its associated error measure. The method also involves displaying parameter information associated with the one of the plurality of histograms to represent the data.

    摘要翻译: 公开了构建用于概率数据的直方图和小波概要的示例方法和装置。 所公开的示例性方法包括接收与概率测量相关联的概率数据,并且基于概率数据生成多个直方图。 基于由概率数据表示的项目生成每个直方图。 此外,使用不同数量的包含不同项目的桶来生成每个直方图。 确定与多个直方图中的每一个相关联的误差测量,并且基于其相关联的误差测量来选择多个直方图中的一个。 该方法还涉及显示与多个直方图之一相关联的参数信息以表示数据。

    Methods and apparatus to construct histogram and wavelet synopses for probabilistic data
    5.
    发明授权
    Methods and apparatus to construct histogram and wavelet synopses for probabilistic data 有权
    用于构建概率数据的直方图和小波概要的方法和装置

    公开(公告)号:US08386412B2

    公开(公告)日:2013-02-26

    申请号:US12334264

    申请日:2008-12-12

    IPC分类号: G06F9/44 G06N7/02 G06N7/06

    摘要: Example methods and apparatus to construct histogram and wavelet synopses for probabilistic data are disclosed. A disclosed example method involves receiving probabilistic data associated with probability measures and generating a plurality of histograms based on the probabilistic data. Each histogram is generated based on items represented by the probabilistic data. In addition, each histogram is generated using a different quantity of buckets containing different ones of the items. An error measure associated with each of the plurality of histograms is determined and one of the plurality of histograms is selected based on its associated error measure. The method also involves displaying parameter information associated with the one of the plurality of histograms to represent the data.

    摘要翻译: 公开了构建用于概率数据的直方图和小波概要的示例方法和装置。 所公开的示例性方法包括接收与概率测量相关联的概率数据,并且基于概率数据生成多个直方图。 基于由概率数据表示的项目生成每个直方图。 此外,使用不同数量的包含不同项目的桶来生成每个直方图。 确定与多个直方图中的每一个相关联的误差测量,并且基于其相关联的误差测量来选择多个直方图中的一个。 该方法还涉及显示与多个直方图之一相关联的参数信息以表示数据。

    Method and apparatus for globally approximating quantiles in a distributed monitoring environment
    6.
    发明申请
    Method and apparatus for globally approximating quantiles in a distributed monitoring environment 有权
    用于在分布式监控环境中全局近似分位数的方法和装置

    公开(公告)号:US20070136285A1

    公开(公告)日:2007-06-14

    申请号:US11301387

    申请日:2005-12-13

    IPC分类号: G06F7/00

    摘要: The invention comprises a method and apparatus for determining a rank of a query value. Specifically, the method comprises receiving a rank query request, determining, for each of the at least one remote monitor, a predicted lower-bound rank value and upper-bound rank value, wherein the predicted lower-bound rank value and upper-bound rank value are determined according to at least one respective prediction model used by each of the at least one remote monitor to compute the at least one local quantile summary, computing a predicted average rank value for each of the at least one remote monitor using the at least one predicted lower-bound rank value and the at least one predicted upper-bound rank value associated with the respective at least one remote monitor, and computing the rank of the query value using the at least one predicted average rank value associated with the respective at least one remote monitor.

    摘要翻译: 本发明包括一种用于确定查询值的等级的方法和装置。 具体地说,该方法包括:接收秩查询请求,为所述至少一个远程监视器中的每一个确定预测的下限秩值和上限秩值,其中预测的下限秩值和上限秩 根据由所述至少一个远程监视器中的每一个使用的至少一个相应的预测模型来确定所述值,以计算所述至少一个本地分位数概要,使用所述至少一个远程监视器至少计算所述至少一个远程监视器中的每一个的预测平均等级值 一个预测的下限秩值和与相应的至少一个远程监视器相关联的至少一个预测的上限秩值,以及使用与各自的至少一个远程监视器相关联的至少一个预测平均等级值来计算查询值的等级 至少一个远程监视器。

    METHODS AND APPARATUS FOR REPRESENTING PROBABILISTIC DATA USING A PROBABILISTIC HISTOGRAM
    7.
    发明申请
    METHODS AND APPARATUS FOR REPRESENTING PROBABILISTIC DATA USING A PROBABILISTIC HISTOGRAM 失效
    使用概率组织表示概率数据的方法和装置

    公开(公告)号:US20110145223A1

    公开(公告)日:2011-06-16

    申请号:US12636544

    申请日:2009-12-11

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30536

    摘要: Methods and apparatus for representing probabilistic data using a probabilistic histogram are disclosed. An example method comprises partitioning a plurality of ordered data items into a plurality of buckets, each of the data items capable of having a data value from a plurality of possible data values with a probability characterized by a respective individual probability distribution function (PDF), each bucket associated with a respective subset of the ordered data items bounded by a respective beginning data item and a respective ending data item, and determining a first representative PDF for a first bucket associated with a first subset of the ordered data items by partitioning the plurality of possible data values into a first plurality of representative data ranges and respective representative probabilities based on an error between the first representative PDF and a first plurality of individual PDFs characterizing the first subset of the ordered data items.

    摘要翻译: 公开了使用概率直方图表示概率数据的方法和装置。 一种示例性方法包括将多个有序数据项划分成多个桶,每个数据项能够具有来自多个可能数据值的数据值,其特征在于各自的概率分布函数(PDF), 每个桶与由相应的开始数据项和相应的结束数据项限定的有序数据项的相应子集相关联,并且通过分割多个数据项来确定与有序数据项的第一子集相关联的第一个桶的第一代表性PDF 基于第一代表性PDF和表征有序数据项的第一子集的第一多个单独PDF之间的误差,将可能的数据值转换成第一多个代表性数据范围和相应的代表概率。

    Streaming algorithms for robust, real-time detection of DDoS attacks
    8.
    发明授权
    Streaming algorithms for robust, real-time detection of DDoS attacks 有权
    用于强大,实时检测DDoS攻击的流式算法

    公开(公告)号:US07669241B2

    公开(公告)日:2010-02-23

    申请号:US10954901

    申请日:2004-09-30

    IPC分类号: G06F12/14

    摘要: A distinct-count estimate is obtained in a guaranteed small footprint using a two level hash, distinct count sketch. A first hash fills the first-level hash buckets with an exponentially decreasing number of data-elements. These are then uniformly hashed to an array of second-level-hash tables, and have an associated total-element counter and bit-location counters. These counters are used to identify singletons and so provide a distinct-sample and a distinct-count. An estimate of the total distinct-count is obtained by dividing by the distinct-count by the probability of mapping a data-element to that bucket. An estimate of the total distinct-source frequencies of destination address can be found in a similar fashion. By further associating the distinct-count sketch with a list of singletons, a total singleton count and a heap containing the destination addresses ordered by their distinct-source frequencies, a tracking distinct-count sketch may be formed that has considerably improved query time.

    摘要翻译: 使用两级散列,不同的计数草图在保证的小尺寸中获得不同的计数估计。 第一个散列填充了数据元素数量级数下降的第一级哈希桶。 然后将它们均匀地散列到二级哈希表的阵列,并具有关联的全元计数器和位位计数器。 这些计数器用于识别单例,因此提供了不同的样本和不同的数字。 通过将distinct-count除以将数据元素映射到该存储桶的概率,可以获得总区分计数的估计。 可以以类似的方式找到目的地地址的不同源频率的总体估计。 通过进一步将不同数量的草图与单例列表相关联,总共单例数和包含由其不同源频率排​​序的目的地地址的堆,可以形成具有显着改进的查询时间的跟踪不同计划草图。

    Tracking set-expression cardinalities over continuous update streams
    9.
    发明授权
    Tracking set-expression cardinalities over continuous update streams 有权
    跟踪连续更新流中的设置表达式基数

    公开(公告)号:US07596544B2

    公开(公告)日:2009-09-29

    申请号:US11025355

    申请日:2004-12-29

    IPC分类号: G06F7/00

    摘要: A method of estimating set-expression cardinalities over data streams with guaranteed small maintenance time per data-element update. The method only examines each data element once and uses a limited amount of memory. The time-efficient stream synopsis extends 2-level hash-sketches by randomly, but uniformly, pre-hashing data-elements prior to logarithmically hashing them to a first-level hash-table. This generates a set of independent 2-level hash-sketches. The set-union cardinality can be estimated by determining the smallest hash-bucket index j at which only a predetermined fraction of the b hash-buckets has a non-empty union |A∪B|. Once a set-union cardinality is estimated, general set-expression cardinalities may be estimated by counting witness elements for the set-expression, i.e., those first-level hash-buckets that are both a singleton for the set-expression and a set-union singleton. The set-expression cardinality is the set-union cardinality times the number of witness elements divided by the number of hash-buckets.

    摘要翻译: 一种估计数据流上的设置表达式基数的方法,每个数据元素更新保证小的维护时间。 该方法仅检查每个数据元素一次并使用有限的内存。 时间有效的流摘要通过随机,但统一地将数据元素进行对数散列之前的第一级散列表来扩展二级散列草图。 这产生一组独立的2级散列草图。 可以通过确定最小的哈希桶索引j来估计设置联合的基数,其中只有预定的b个哈希桶的一部分具有非空联合|A∪B|。 一旦估计了一个组合基数,就可以通过对集表达式的见证元素进行计数来估计一般的集合表示基数,即那些既是集合表达式的单例的一级哈希数据包, 联合单身人士 set-expression的基数是set-union的基数乘以证人的数量除以哈希桶的数量。