Sketch-based multi-query processing over data streams
    1.
    发明申请
    Sketch-based multi-query processing over data streams 有权
    基于草图的数据流多查询处理

    公开(公告)号:US20060161566A1

    公开(公告)日:2006-07-20

    申请号:US11025211

    申请日:2004-12-29

    IPC分类号: G06F7/00

    摘要: A method of efficiently providing estimated answers to workloads of aggregate, multi-join SQL-like queries over a number of input data-streams. The method only examines each data elements once and uses a limited amount of computer memory. The method uses join graphs and atomic sketches that are essentially pseudo-random summaries formed using random binary variables. The estimated answer is the product of all the atomic sketches for all the vertices in the query join graph. A query workload is processed efficiently by identifying and sharing atomic sketches common to distinct queries, while ensuring that the join graphs remain well formed. The method may automatically minimize either the average query error or the maximum query error over the workload.

    摘要翻译: 一种有效提供对多个输入数据流的聚合,多连接SQL类查询的工作负载的估计答案的方法。 该方法仅检查每个数据元素一次并使用有限数量的计算机存储器。 该方法使用连接图和原子素描,它们本质上是使用随机二进制变量形成的伪随机摘要。 估计答案是查询连接图中所有顶点的所有原子草图的乘积。 通过识别和共享不同查询共同的原子草图,同时确保连接图形式保持良好,可以有效地处理查询工作负载。 该方法可以自动最小化平均查询错误或工作负载上的最大查询错误。

    Sketch-based multi-query processing over data streams
    2.
    发明授权
    Sketch-based multi-query processing over data streams 有权
    基于草图的数据流多查询处理

    公开(公告)号:US07328220B2

    公开(公告)日:2008-02-05

    申请号:US11025211

    申请日:2004-12-29

    IPC分类号: G06F17/00

    摘要: A method of efficiently providing estimated answers to workloads of aggregate, multi-join SQL-like queries over a number of input data-streams. The method only examines each data elements once and uses a limited amount of computer memory. The method uses join graphs and atomic sketches that are essentially pseudo-random summaries formed using random binary variables. The estimated answer is the product of all the atomic sketches for all the vertices in the query join graph. A query workload is processed efficiently by identifying and sharing atomic sketches common to distinct queries, while ensuring that the join graphs remain well formed. The method may automatically minimize either the average query error or the maximum query error over the workload.

    摘要翻译: 一种有效提供对多个输入数据流的聚合,多连接SQL类查询的工作负载的估计答案的方法。 该方法仅检查每个数据元素一次并使用有限数量的计算机存储器。 该方法使用连接图和原子素描,它们本质上是使用随机二进制变量形成的伪随机摘要。 估计答案是查询连接图中所有顶点的所有原子草图的乘积。 通过识别和共享不同查询共同的原子草图,同时确保连接图形式保持良好,可以有效地处理查询工作负载。 该方法可以自动最小化平均查询错误或工作负载上的最大查询错误。

    Streaming algorithms for robust, real-time detection of DDoS attacks
    3.
    发明授权
    Streaming algorithms for robust, real-time detection of DDoS attacks 有权
    用于强大,实时检测DDoS攻击的流式算法

    公开(公告)号:US07669241B2

    公开(公告)日:2010-02-23

    申请号:US10954901

    申请日:2004-09-30

    IPC分类号: G06F12/14

    摘要: A distinct-count estimate is obtained in a guaranteed small footprint using a two level hash, distinct count sketch. A first hash fills the first-level hash buckets with an exponentially decreasing number of data-elements. These are then uniformly hashed to an array of second-level-hash tables, and have an associated total-element counter and bit-location counters. These counters are used to identify singletons and so provide a distinct-sample and a distinct-count. An estimate of the total distinct-count is obtained by dividing by the distinct-count by the probability of mapping a data-element to that bucket. An estimate of the total distinct-source frequencies of destination address can be found in a similar fashion. By further associating the distinct-count sketch with a list of singletons, a total singleton count and a heap containing the destination addresses ordered by their distinct-source frequencies, a tracking distinct-count sketch may be formed that has considerably improved query time.

    摘要翻译: 使用两级散列,不同的计数草图在保证的小尺寸中获得不同的计数估计。 第一个散列填充了数据元素数量级数下降的第一级哈希桶。 然后将它们均匀地散列到二级哈希表的阵列,并具有关联的全元计数器和位位计数器。 这些计数器用于识别单例,因此提供了不同的样本和不同的数字。 通过将distinct-count除以将数据元素映射到该存储桶的概率,可以获得总区分计数的估计。 可以以类似的方式找到目的地地址的不同源频率的总体估计。 通过进一步将不同数量的草图与单例列表相关联,总共单例数和包含由其不同源频率排​​序的目的地地址的堆,可以形成具有显着改进的查询时间的跟踪不同计划草图。

    Tracking set-expression cardinalities over continuous update streams
    4.
    发明授权
    Tracking set-expression cardinalities over continuous update streams 有权
    跟踪连续更新流中的设置表达式基数

    公开(公告)号:US07596544B2

    公开(公告)日:2009-09-29

    申请号:US11025355

    申请日:2004-12-29

    IPC分类号: G06F7/00

    摘要: A method of estimating set-expression cardinalities over data streams with guaranteed small maintenance time per data-element update. The method only examines each data element once and uses a limited amount of memory. The time-efficient stream synopsis extends 2-level hash-sketches by randomly, but uniformly, pre-hashing data-elements prior to logarithmically hashing them to a first-level hash-table. This generates a set of independent 2-level hash-sketches. The set-union cardinality can be estimated by determining the smallest hash-bucket index j at which only a predetermined fraction of the b hash-buckets has a non-empty union |A∪B|. Once a set-union cardinality is estimated, general set-expression cardinalities may be estimated by counting witness elements for the set-expression, i.e., those first-level hash-buckets that are both a singleton for the set-expression and a set-union singleton. The set-expression cardinality is the set-union cardinality times the number of witness elements divided by the number of hash-buckets.

    摘要翻译: 一种估计数据流上的设置表达式基数的方法,每个数据元素更新保证小的维护时间。 该方法仅检查每个数据元素一次并使用有限的内存。 时间有效的流摘要通过随机,但统一地将数据元素进行对数散列之前的第一级散列表来扩展二级散列草图。 这产生一组独立的2级散列草图。 可以通过确定最小的哈希桶索引j来估计设置联合的基数,其中只有预定的b个哈希桶的一部分具有非空联合|A∪B|。 一旦估计了一个组合基数,就可以通过对集表达式的见证元素进行计数来估计一般的集合表示基数,即那些既是集合表达式的单例的一级哈希数据包, 联合单身人士 set-expression的基数是set-union的基数乘以证人的数量除以哈希桶的数量。

    Processing data-stream join aggregates using skimmed sketches
    5.
    发明授权
    Processing data-stream join aggregates using skimmed sketches 有权
    使用撇去草图处理数据流连接聚合

    公开(公告)号:US07483907B2

    公开(公告)日:2009-01-27

    申请号:US11025578

    申请日:2004-12-29

    IPC分类号: G06F17/30

    摘要: A method of estimating an aggregate of a join over data-streams in real-time using skimmed sketches, that only examines each data element once and has a worst case space requirement of O(n2/J), where J is the size of the join and n is the number of data elements. The skimmed sketch is an atomic sketch, formed as the inner product of the data-stream frequency vector and a random binary variable, from which the frequency values that exceed a predetermined threshold have been skimmed off and placed in a dense frequency vector. The join size is estimated as the sum of the sub-joins of skimmed sketches and dense frequency vectors. The atomic sketches may be arranged in a hash structure so that processing a data element only requires updating a single sketch per hash table. This keeps the per-element overhead logarithmic in the domain and stream sizes.

    摘要翻译: 一种通过数据流实时估计聚合的方法,使用撇去草图,仅对每个数据元素进行一次检查,并具有O(n2 / J)的最差情况空间要求,其中J为 join,n是数据元素的数量。 撇去草图是一个原子草图,形成为数据流频率向量的内积和随机二进制变量,超过预定阈值的频率值已被从该数据流撇去并置于密集的频率向量中。 连接尺寸被估计为脱脂草图和密集频率矢量的子连接的总和。 原子草图可以以哈希结构排列,使得处理数据元素仅需要更新每个散列表的单个草图。 这将使每个元素的开销对数在域和流大小中保持一致。

    Distributed set-expression cardinality estimation
    6.
    发明申请
    Distributed set-expression cardinality estimation 有权
    分布集表达式基数估计

    公开(公告)号:US20060149744A1

    公开(公告)日:2006-07-06

    申请号:US11026499

    申请日:2004-12-30

    IPC分类号: G06F17/30

    摘要: A method and system for answering set-expression cardinality queries while lowering data communication costs by utilizing a coordinator site to provide global knowledge of the distribution of certain frequently occurring stream elements to significantly reduce the transmission of element state information to the central site and, optionally, capturing the semantics of the input set expression in a Boolean logic formula and using models of the formula to determine whether an element state change at a remote site can affect the set expression result.

    摘要翻译: 一种用于在降低数据通信成本的同时降低数据通信成本的方法和系统,通过利用协调器站点来提供关于某些频繁发生的流元素的分布的全局知识,以显着地减少元件状态信息到中心站点的传输, ,以布尔逻辑公式捕获输入集表达式的语义,并使用公式的模型来确定远程站点上的元素状态更改是否会影响集合表达式结果。

    Method and apparatus for globally approximating quantiles in a distributed monitoring environment
    8.
    发明申请
    Method and apparatus for globally approximating quantiles in a distributed monitoring environment 有权
    用于在分布式监控环境中全局近似分位数的方法和装置

    公开(公告)号:US20070136285A1

    公开(公告)日:2007-06-14

    申请号:US11301387

    申请日:2005-12-13

    IPC分类号: G06F7/00

    摘要: The invention comprises a method and apparatus for determining a rank of a query value. Specifically, the method comprises receiving a rank query request, determining, for each of the at least one remote monitor, a predicted lower-bound rank value and upper-bound rank value, wherein the predicted lower-bound rank value and upper-bound rank value are determined according to at least one respective prediction model used by each of the at least one remote monitor to compute the at least one local quantile summary, computing a predicted average rank value for each of the at least one remote monitor using the at least one predicted lower-bound rank value and the at least one predicted upper-bound rank value associated with the respective at least one remote monitor, and computing the rank of the query value using the at least one predicted average rank value associated with the respective at least one remote monitor.

    摘要翻译: 本发明包括一种用于确定查询值的等级的方法和装置。 具体地说,该方法包括:接收秩查询请求,为所述至少一个远程监视器中的每一个确定预测的下限秩值和上限秩值,其中预测的下限秩值和上限秩 根据由所述至少一个远程监视器中的每一个使用的至少一个相应的预测模型来确定所述值,以计算所述至少一个本地分位数概要,使用所述至少一个远程监视器至少计算所述至少一个远程监视器中的每一个的预测平均等级值 一个预测的下限秩值和与相应的至少一个远程监视器相关联的至少一个预测的上限秩值,以及使用与各自的至少一个远程监视器相关联的至少一个预测平均等级值来计算查询值的等级 至少一个远程监视器。

    Streaming algorithms for robust, real-time detection of DDoS attacks
    10.
    发明申请
    Streaming algorithms for robust, real-time detection of DDoS attacks 有权
    用于强大,实时检测DDoS攻击的流式算法

    公开(公告)号:US20060075489A1

    公开(公告)日:2006-04-06

    申请号:US10954901

    申请日:2004-09-30

    IPC分类号: G06F12/14

    摘要: A distinct-count estimate is obtained in a guaranteed small footprint using a two level hash, distinct count sketch. A first hash fills the first-level hash buckets with an exponentially decreasing number of data-elements. These are then uniformly hashed to an array of second-level-hash tables, and have an associated total-element counter and bit-location counters. These counters are used to identify singletons and so provide a distinct-sample and a distinct-count. An estimate of the total distinct-count is obtained by dividing by the distinct-count by the probability of mapping a data-element to that bucket. An estimate of the total distinct-source frequencies of destination address can be found in a similar fashion. By further associating the distinct-count sketch with a list of singletons, a total singleton count and a heap containing the destination addresses ordered by their distinct-source frequencies, a tracking distinct-count sketch may be formed that has considerably improved query time.

    摘要翻译: 使用两级散列,不同的计数草图在保证的小尺寸中获得不同的计数估计。 第一个散列填充了数据元素数量级数下降的第一级哈希桶。 然后将它们均匀地散列到二级哈希表的阵列,并具有关联的全元计数器和位位计数器。 这些计数器用于识别单例,因此提供了不同的样本和不同的数字。 通过将distinct-count除以将数据元素映射到该存储桶的概率,可以获得总区分计数的估计。 可以以类似的方式找到目的地地址的不同源频率的总体估计。 通过进一步将不同数量的草图与单例列表相关联,总共单例数和包含由其不同源频率排​​序的目的地地址的堆,可以形成具有显着改进的查询时间的跟踪不同计划草图。