Variance-optimal sampling-based estimation of subset sums
    1.
    发明授权
    Variance-optimal sampling-based estimation of subset sums 失效
    基于方差最优采样的子集合估计

    公开(公告)号:US08005949B2

    公开(公告)日:2011-08-23

    申请号:US12325340

    申请日:2008-12-01

    IPC分类号: G06F15/173

    摘要: The present invention relates to a method of obtaining a generic sample of an input stream. The method is designated as VAROPTk. The method comprises receiving an input stream of items arriving one at a time, and maintaining a sample S of items i. The sample S has a capacity for at most k items i. The sample S is filled with k items i. An nth item i is received. It is determined whether the nth item i should be included in sample S. If the nth item i is included in sample S, then a previously included item i is dropped from sample S. The determination is made based on weights of items without distinguishing between previously included items i and the nth item i. The determination is implemented thereby updating weights of items i in sample S. The method is repeated until no more items are received.

    摘要翻译: 本发明涉及一种获得输入流的通用样本的方法。 该方法被指定为VAROPTk。 该方法包括一次接收一个物品的输入流,并且保持项目i的样本S. 样本S具有最多k个项目i的容量。 样本S填充有k个项目i。 收到第n项。 确定第n个项目i是否应该包含在样本S中。如果第n个项目i包括在样本S中,则先前包括的项目i从样本S中丢弃。根据项目的权重进行确定,而不区分 以前包括项目i和第n项目i。 由此实现确定,从而更新样本S中的项目i的权重。重复该方法,直到不再收到项目。

    Algorithms and estimators for summarization of unaggregated data streams
    2.
    发明授权
    Algorithms and estimators for summarization of unaggregated data streams 失效
    用于汇总未分类数据流的算法和估计

    公开(公告)号:US07764625B2

    公开(公告)日:2010-07-27

    申请号:US12136705

    申请日:2008-06-10

    IPC分类号: H04L12/26

    摘要: The invention relates to streaming algorithms useful for obtaining summaries over unaggregated packet streams and for providing unbiased estimators for characteristics, such as, the amount of traffic that belongs to a specified subpopulation of flows. Packets are sampled from a packet stream and aggregated into flows and counted by implementation of: (a) Adaptive Sampled NetFlow (ANF), and adjusted weight (AANF) of a flow (f) is calculated as follows: AANF(f)=i(f)/p′; i(f) being the number of packets counted for a flow f, and p′ being the sampling rate at end of a measurement period; or (b) Adaptive Sample-and-Hold (ASH), and adjusted weight (AASH) of a flow (f) is calculated as follows: AASH(f)=i(f)+(1−p′)/p′; i(f) being the number of packets counted for a flow f, and p′ being the sampling rate at end of a measurement period.

    摘要翻译: 本发明涉及用于在未分组的分组流上获得摘要的用于提供用于特征的无偏估计器的流式传输算法,例如属于指定的流量子群的业务量。 分组从分组流中采样并聚合成流,并通过实现计算:(a)自适应采样NetFlow(ANF)和流(f)的调整权重(AANF)计算如下:AANF(f)= i (f)/ p'; i(f)是流f计数的分组数,p'是测量周期结束时的采样率; 或(b)自适应采样保持(ASH)和流(f)的调整权重(AASH)如下计算:AASH(f)= i(f)+(1-p')/ p' ; i(f)是流f计数的分组数,p'是测量周期结束时的采样率。

    Algorithms and estimators for summarization of unaggregated data streams
    3.
    发明授权
    Algorithms and estimators for summarization of unaggregated data streams 失效
    用于汇总未分类数据流的算法和估计

    公开(公告)号:US07746808B2

    公开(公告)日:2010-06-29

    申请号:US12136725

    申请日:2008-06-10

    IPC分类号: H04L12/26

    CPC分类号: H04L43/024

    摘要: The invention relates to streaming algorithms useful for obtaining summaries over unaggregated packet streams and for providing unbiased estimators for characteristics, such as, the amount of traffic that belongs to a specified subpopulation of flows. Packets are sampled from a packet stream and aggregated into flows and counted by implementation of Adaptive Sample-and-Hold (ASH) or Adaptive NetFlow (ANF), adjusting the sampling rate based on a quantity of flows to obtain a sketch having a predetermined size, the sampling rate being adjusted in steps; and transferring the count of aggregated packets from SRAM to DRAM and initializing the count in SRAM following adjustment of the sampling rate.

    摘要翻译: 本发明涉及用于在未分组的分组流上获得摘要的用于提供用于特征的无偏估计器的流式传输算法,例如属于指定的流量子群的业务量。 分组从分组流中采样并聚合成流,并通过实施自适应采样保持(ASH)或自适应净流(ANF)进行计数,根据流量调整采样率,以获得具有预定尺寸的草图 采样率逐步调整; 并将汇总数据包从SRAM传输到DRAM,并在采样率调整后初始化SRAM中的计数。

    Method for summarizing data in unaggregated data streams
    4.
    发明申请
    Method for summarizing data in unaggregated data streams 有权
    用于汇总未分组数据流中的数据的方法

    公开(公告)号:US20110153554A1

    公开(公告)日:2011-06-23

    申请号:US12653831

    申请日:2009-12-18

    IPC分类号: G06F17/30

    CPC分类号: H04L43/028 H04L43/04

    摘要: A method for producing a summary A of data points in an unaggregated data stream wherein the data points are in the form of weighted keys (a, w) where a is a key and w is a weight, and the summary is a sample of k keys a with adjusted weights wa. A first reservoir L includes keys having adjusted weights which are additions of weights of individual data points of included keys and a second reservoir T includes keys having adjusted weights which are each equal to a threshold value τ whose value is adjusted based upon tests of new data points arriving in the data stream. The summary combines the keys and adjusted weights of the first reservoir L with the keys and adjusted weights of the second reservoir T to form the sample representing the data stream upon which further analysis may be performed. The method proceeds by first merging new data points in the stream into the reservoir L until the reservoir contains k different keys and thereafter applying a series of tests to new arriving data points to determine what keys and weights are to be added to or removed the reservoirs L and T to provide a summary with a variance that approaches the minimum possible for aggregated data sets. The method is composable, can be applied to high speed data streams such as those found on the Internet, and can be implemented efficiently.

    摘要翻译: 一种用于产生未聚集数据流中的数据点的摘要A的方法,其中数据点是加权密钥(a,w)的形式,其中a是密钥,w是权重,并且摘要是k的样本 键a与调整权重wa。 第一储存器L包括具有调整权重的密钥,这些密钥是附加密钥的各个数据点的加权的加法,而第二储存器T包括具有调整的权重的密钥,其各自等于基于新数据的测试来调整其值的阈值τ 到达数据流的点。 总结将第一储层L的密钥和调整的权重与密钥和第二储存器T的调整权重组合,以形成表示可以进行进一步分析的数据流的样本。 该方法通过首先将流中的新数据点合并到储存器L中,直到储存器包含k个不同的密钥,然后对新的到达数据点应用一系列测试,以确定要添加到或移除存储器的哪些密钥和权重 L和T提供一个总结,其方差接近汇总数据集的最小可能性。 该方法是可组合的,可以应用于诸如在因特网上发现的高速数据流,并且可以有效地实现。

    Variance-Optimal Sampling-Based Estimation of Subset Sums
    5.
    发明申请
    Variance-Optimal Sampling-Based Estimation of Subset Sums 失效
    子集合的方差最优采样估计

    公开(公告)号:US20100138529A1

    公开(公告)日:2010-06-03

    申请号:US12325340

    申请日:2008-12-01

    IPC分类号: G06F17/18 G06F15/16

    摘要: The present invention relates to a method of obtaining a generic sample of an input stream. The method is designated as VAROPTk. The method comprises receiving an input stream of items arriving one at a time, and maintaining a sample S of items i. The sample S has a capacity for at most k items i. The sample S is filled with k items i. An nth item i is received. It is determined whether the nth item i should be included in sample S. If the nth item i is included in sample S, then a previously included item i is dropped from sample S. The determination is made based on weights of items without distinguishing between previously included items i and the nth item i. The determination is implemented thereby updating weights of items i in sample S. The method is repeated until no more items are received.

    摘要翻译: 本发明涉及一种获得输入流的通用样本的方法。 该方法被指定为VAROPTk。 该方法包括一次接收一个物品的输入流,并且保持项目i的样本S. 样本S具有最多k个项目i的容量。 样本S填充有k个项目i。 收到第n项。 确定第n个项目i是否应该包含在样本S中。如果第n个项目i包括在样本S中,则先前包括的项目i从样本S中丢弃。根据项目的权重进行确定,而不区分 以前包括项目i和第n项目i。 由此实现确定,从而更新样本S中的项目i的权重。重复该方法,直到不再收到项目。

    Method for summarizing data in unaggregated data streams
    6.
    发明授权
    Method for summarizing data in unaggregated data streams 有权
    用于汇总未分组数据流中的数据的方法

    公开(公告)号:US08195710B2

    公开(公告)日:2012-06-05

    申请号:US12653831

    申请日:2009-12-18

    IPC分类号: G06F17/00

    CPC分类号: H04L43/028 H04L43/04

    摘要: A method for producing a summary A of data points in an unaggregated data stream wherein the data points are in the form of weighted keys (a, w) where a is a key and w is a weight, and the summary is a sample of k keys a with adjusted weights wa. A first reservoir L includes keys having adjusted weights which are additions of weights of individual data points of included keys and a second reservoir T includes keys having adjusted weights which are each equal to a threshold value τ whose value is adjusted based upon tests of new data points arriving in the data stream. The summary combines the keys and adjusted weights of the first reservoir L with the keys and adjusted weights of the second reservoir T to form the sample representing the data stream upon which further analysis may be performed. The method proceeds by first merging new data points in the stream into the reservoir L until the reservoir contains k different keys and thereafter applying a series of tests to new arriving data points to determine what keys and weights are to be added to or removed the reservoirs L and T to provide a summary with a variance that approaches the minimum possible for aggregated data sets. The method is composable, can be applied to high speed data streams such as those found on the Internet, and can be implemented efficiently.

    摘要翻译: 一种用于产生未聚集数据流中的数据点的摘要A的方法,其中数据点是加权密钥(a,w)的形式,其中a是密钥,w是权重,并且摘要是k的样本 键a与调整权重wa。 第一储存器L包括具有调整权重的密钥,这些密钥是附加密钥的各个数据点的加权的加法,而第二储存器T包括具有调整的权重的密钥,其各自等于基于新数据的测试来调整其值的阈值τ 到达数据流的点。 总结将第一储层L的密钥和调整的权重与密钥和第二储存器T的调整权重组合,以形成表示可以进行进一步分析的数据流的样本。 该方法通过首先将流中的新数据点合并到储存器L中,直到储存器包含k个不同的密钥,然后对新的到达数据点应用一系列测试,以确定要添加到或移除存储器的哪些密钥和权重 L和T提供一个总结,其方差接近汇总数据集的最小可能性。 该方法是可组合的,可以应用于诸如在因特网上发现的高速数据流,并且可以有效地实现。

    Algorithms and Estimators for Summarization of Unaggregated Data Streams
    7.
    发明申请
    Algorithms and Estimators for Summarization of Unaggregated Data Streams 失效
    用于汇总未分类数据流的算法和估计器

    公开(公告)号:US20090303901A1

    公开(公告)日:2009-12-10

    申请号:US12136725

    申请日:2008-06-10

    IPC分类号: H04L12/26

    CPC分类号: H04L43/024

    摘要: The invention relates to streaming algorithms useful for obtaining summaries over unaggregated packet streams and for providing unbiased estimators for characteristics, such as, the amount of traffic that belongs to a specified subpopulation of flows. Packets are sampled from a packet stream and aggregated into flows and counted by implementation of Adaptive Sample-and-Hold (ASH) or Adaptive NetFlow (ANF), adjusting the sampling rate based on a quantity of flows to obtain a sketch having a predetermined size, the sampling rate being adjusted in steps; and transferring the count of aggregated packets from SRAM to DRAM and initializing the count in SRAM following adjustment of the sampling rate.

    摘要翻译: 本发明涉及用于在未分组的分组流上获得摘要的用于提供用于特征的无偏估计器的流式传输算法,例如属于指定的流量子群的业务量。 分组从分组流中采样并聚合成流,并通过实施自适应采样保持(ASH)或自适应净流(ANF)进行计数,根据流量调整采样率,以获得具有预定尺寸的草图 采样率逐步调整; 并将汇总数据包从SRAM传输到DRAM,并在采样率调整后初始化SRAM中的计数。

    Algorithms and Estimators for Summarization of Unaggregated Data Streams
    8.
    发明申请
    Algorithms and Estimators for Summarization of Unaggregated Data Streams 失效
    用于汇总未分类数据流的算法和估计器

    公开(公告)号:US20090303879A1

    公开(公告)日:2009-12-10

    申请号:US12136705

    申请日:2008-06-10

    IPC分类号: H04L12/56

    摘要: The invention relates to streaming algorithms useful for obtaining summaries over unaggregated packet streams and for providing unbiased estimators for characteristics, such as, the amount of traffic that belongs to a specified subpopulation of flows. Packets are sampled from a packet stream and aggregated into flows and counted by implementation of: (a) Adaptive Sampled NetFlow (ANF), and adjusted weight (AANF) of a flow (f) is calculated as follows: AANF(f)=i(f)/p′; i(f) being the number of packets counted for a flow f, and p′ being the sampling rate at end of a measurement period; or (b) Adaptive Sample-and-Hold (ASH), and adjusted weight (AASH) of a flow (f) is calculated as follows: AASH(f)=i(f)+(1−p′)/p′; i(f) being the number of packets counted for a flow f, and p′ being the sampling rate at end of a measurement period.

    摘要翻译: 本发明涉及用于在未分组的分组流上获得摘要的用于提供用于特征的无偏估计器的流式传输算法,例如属于指定的流量子群的业务量。 分组从分组流中采样并聚合成流,并通过实现计算:(a)自适应采样NetFlow(ANF)和流(f)的调整权重(AANF)计算如下:AANF(f)= i (f)/ p'; i(f)是流f计数的分组数,p'是测量周期结束时的采样率; 或(b)自适应采样保持(ASH)和流(f)的调整权重(AASH)如下计算:AASH(f)= i(f)+(1-p')/ p' ; i(f)是流f计数的分组数,p'是测量周期结束时的采样率。