Retrieval system and method
    1.
    发明授权
    Retrieval system and method 失效
    检索系统和方法

    公开(公告)号:US5950189A

    公开(公告)日:1999-09-07

    申请号:US775913

    申请日:1997-01-02

    Abstract: The invention is an improved retrieval system and method. Many pattern recognition tasks, including estimation, classification, and the finding of similar objects, make use of linear models. For example, many text retrieval systems represent queries as linear functions, and retrieve documents whose vector representation has a high dot product with the query. The fundamental operation in such tasks is the computation of the dot product between a query vector and a large database of instance vectors. Often instance vectors which have high dot products with the query are of interest. The invention relates to a random sampling based retrieval system that can identify, for any given query vector, those instance vectors which have large dot products, while avoiding explicit computation of all dot products.

    Abstract translation: 本发明是一种改进的检索系统和方法。 许多模式识别任务,包括估计,分类和类似对象的发现,都使用线性模型。 例如,许多文本检索系统将查询表示为线性函数,并且检索其向量表示与查询具有高点积的文档。 这些任务的基本操作是计算查询向量和实例向量的大型数据库之间的点积。 通常,具有查询的高点积的实例向量是感兴趣的。 本发明涉及一种基于随机抽样的检索系统,可以为任何给定的查询向量识别具有大点积的那些实例向量,同时避免所有点产品的显式计算。

    Algorithms and estimators for summarization of unaggregated data streams
    2.
    发明授权
    Algorithms and estimators for summarization of unaggregated data streams 失效
    用于汇总未分类数据流的算法和估计

    公开(公告)号:US07746808B2

    公开(公告)日:2010-06-29

    申请号:US12136725

    申请日:2008-06-10

    CPC classification number: H04L43/024

    Abstract: The invention relates to streaming algorithms useful for obtaining summaries over unaggregated packet streams and for providing unbiased estimators for characteristics, such as, the amount of traffic that belongs to a specified subpopulation of flows. Packets are sampled from a packet stream and aggregated into flows and counted by implementation of Adaptive Sample-and-Hold (ASH) or Adaptive NetFlow (ANF), adjusting the sampling rate based on a quantity of flows to obtain a sketch having a predetermined size, the sampling rate being adjusted in steps; and transferring the count of aggregated packets from SRAM to DRAM and initializing the count in SRAM following adjustment of the sampling rate.

    Abstract translation: 本发明涉及用于在未分组的分组流上获得摘要的用于提供用于特征的无偏估计器的流式传输算法,例如属于指定的流量子群的业务量。 分组从分组流中采样并聚合成流,并通过实施自适应采样保持(ASH)或自适应净流(ANF)进行计数,根据流量调整采样率,以获得具有预定尺寸的草图 采样率逐步调整; 并将汇总数据包从SRAM传输到DRAM,并在采样率调整后初始化SRAM中的计数。

    Method and apparatus for improving end to end performance of a data network
    3.
    发明授权
    Method and apparatus for improving end to end performance of a data network 失效
    一种改善数据网络端到端性能的方法和装置

    公开(公告)号:US06330561B1

    公开(公告)日:2001-12-11

    申请号:US09105018

    申请日:1998-06-26

    Abstract: A method and apparatus provide improved cache coherency and more effective caching operations without placing an undue burden on network links. A proxy receives a request for a resource and then, depending on information in the proxy cache, generates a resource request for transmission to a resource server. The proxy appends a proxy filter to the request. The resource server maintains one or more volumes of resources based on some predetermined criterion that can be either static or dynamic in nature. Upon receipt of the request and the proxy filter the resource server generates a request response and a piggybacked list of additional resources selected from the volume with which the requested resource is associated.

    Abstract translation: 一种方法和装置提供改进的高速缓存一致性和更有效的高速缓存操作,而不会对网络链路造成不必要的负担。 代理接收对资源的请求,然后根据代理缓存中的信息生成资源请求以传送到资源服务器。 该代理为请求附加一个代理筛选器。 资源服务器基于某些可以是静态或动态的预定标准来维护一个或多个资源量。 在接收到请求和代理过滤器之后,资源服务器生成从所请求的资源与之相关联的卷中选择的附加资源的请求响应和附带的列表。

    Methods and systems to estimate query responses based on data set sketches
    4.
    发明授权
    Methods and systems to estimate query responses based on data set sketches 有权
    基于数据集草图来估计查询响应的方法和系统

    公开(公告)号:US08738618B2

    公开(公告)日:2014-05-27

    申请号:US12334152

    申请日:2008-12-12

    CPC classification number: G06F17/3053 G06F17/30979

    Abstract: Methods and systems for estimate derivation are described. In one embodiment, a query may be received with a predicate for sets over a collection of items. Associated samples associated with the query may be accessed. Items of an associated sample may be accessed from the collection of items. A determination of whether the predicate is an attribute-based selection from a union of at least some sets may be made. Available items of the particular associated sample may be selected from the items. Identified items may be identified among the available items in the associated sample that satisfy the predicate. An adjusted weight may be assigned to an item based on a weight of the item and a distribution of the associated samples. An estimate may be generated based on the adjusted weight of the identified items of the associated samples that satisfy the predicate. Additional methods and systems are disclosed.

    Abstract translation: 描述了用于估计推导的方法和系统。 在一个实施例中,可以使用关于项集合的集合的谓词来接收查询。 可以访问与查询相关联的关联样本。 可以从项目集合中访问相关联样本的项目。 可以确定谓词是否是来自至少一些集合的联合的基于属性的选择。 可以从项目中选择特定关联样品的可用项目。 可以在满足谓词的关联样本中的可用项目之间识别所识别的项目。 可以基于项目的权重和相关联样本的分布将调整后的权重分配给项目。 可以基于满足谓词的相关联样本的所识别项目的调整权重来生成估计。 公开了附加的方法和系统。

    Method and apparatus for processing of top-K queries from samples
    5.
    发明授权
    Method and apparatus for processing of top-K queries from samples 失效
    从样本处理顶级K查询的方法和装置

    公开(公告)号:US08706737B2

    公开(公告)日:2014-04-22

    申请号:US12347474

    申请日:2008-12-31

    CPC classification number: G06F17/3053 G06F17/30536 H04L41/12

    Abstract: A method and apparatus for processing top-k queries are disclosed. For example, the method receives a top-k query with a value for a number of samples and a value of a confidence parameter. The method samples in accordance to the number of samples, and determines a top-k weight of a sample top-k set. The method bounds the top-k weight in an interval having an upper bound and a lower bound such that the top-k weight is in the interval with a probability equal to one minus the value of the confidence parameter, and provides a response to the top-k query in accordance with the upper and lower bounds.

    Abstract translation: 公开了一种用于处理top-k查询的方法和装置。 例如,该方法接收具有多个样本的值和置信度参数的值的top-k查询。 该方法根据采样数量采样,并确定样本top-k集合的顶部k个权重。 该方法限制了具有上限和下限的间隔中的top-k权重,使得top-k权重在等于1的概率等于减去置信参数的值的间隔中,并且向 top-k查询按照上下界。

    Method for summarizing data in unaggregated data streams
    6.
    发明授权
    Method for summarizing data in unaggregated data streams 有权
    用于汇总未分组数据流中的数据的方法

    公开(公告)号:US08195710B2

    公开(公告)日:2012-06-05

    申请号:US12653831

    申请日:2009-12-18

    CPC classification number: H04L43/028 H04L43/04

    Abstract: A method for producing a summary A of data points in an unaggregated data stream wherein the data points are in the form of weighted keys (a, w) where a is a key and w is a weight, and the summary is a sample of k keys a with adjusted weights wa. A first reservoir L includes keys having adjusted weights which are additions of weights of individual data points of included keys and a second reservoir T includes keys having adjusted weights which are each equal to a threshold value τ whose value is adjusted based upon tests of new data points arriving in the data stream. The summary combines the keys and adjusted weights of the first reservoir L with the keys and adjusted weights of the second reservoir T to form the sample representing the data stream upon which further analysis may be performed. The method proceeds by first merging new data points in the stream into the reservoir L until the reservoir contains k different keys and thereafter applying a series of tests to new arriving data points to determine what keys and weights are to be added to or removed the reservoirs L and T to provide a summary with a variance that approaches the minimum possible for aggregated data sets. The method is composable, can be applied to high speed data streams such as those found on the Internet, and can be implemented efficiently.

    Abstract translation: 一种用于产生未聚集数据流中的数据点的摘要A的方法,其中数据点是加权密钥(a,w)的形式,其中a是密钥,w是权重,并且摘要是k的样本 键a与调整权重wa。 第一储存器L包括具有调整权重的密钥,这些密钥是附加密钥的各个数据点的加权的加法,而第二储存器T包括具有调整的权重的密钥,其各自等于基于新数据的测试来调整其值的阈值τ 到达数据流的点。 总结将第一储层L的密钥和调整的权重与密钥和第二储存器T的调整权重组合,以形成表示可以进行进一步分析的数据流的样本。 该方法通过首先将流中的新数据点合并到储存器L中,直到储存器包含k个不同的密钥,然后对新的到达数据点应用一系列测试,以确定要添加到或移除存储器的哪些密钥和权重 L和T提供一个总结,其方差接近汇总数据集的最小可能性。 该方法是可组合的,可以应用于诸如在因特网上发现的高速数据流,并且可以有效地实现。

    Systems, devices, and/or methods for determining dataset estimators
    7.
    发明授权
    Systems, devices, and/or methods for determining dataset estimators 失效
    用于确定数据集估计器的系统,设备和/或方法

    公开(公告)号:US08140539B1

    公开(公告)日:2012-03-20

    申请号:US12186997

    申请日:2008-08-06

    CPC classification number: G06F17/30536

    Abstract: Certain exemplary embodiments can provide a method, which can comprise automatically storing a sketch of a dataset that supports automatic determination of an estimator of properties of a dataset. The automatic determination can be based upon computed adjusted weights to the items included in a sketch of the dataset. The adjusted weights can be used to compute estimates on the weight of any subpopulation of the items in the dataset that is specified using a selection predicate. We propose the rank conditioning, the subset conditioning, and/or a Markov-chain based method to compute these adjusted weights. We also provide a method that provides upper and lower confidence bounds on the size of a subpopulation.

    Abstract translation: 某些示例性实施例可以提供一种方法,其可以包括自动存储支持数据集的属性的估计器的自动确定的数据集的草图。 自动确定可以基于计算的对数据集草图中包含的项目的调整权重。 调整的权重可用于计算使用选择谓词指定的数据集中项目的任何子群体的权重的估计。 我们提出等级条件,子集条件和/或基于马尔科夫链的方法来计算这些调整权重。 我们还提供了一种方法,可以为子群体的大小提供上下限的置信区间。

    METHODS AND SYSTEMS FOR ESTIMATE DERIVATION
    8.
    发明申请
    METHODS AND SYSTEMS FOR ESTIMATE DERIVATION 有权
    估计衍生的方法和系统

    公开(公告)号:US20100153387A1

    公开(公告)日:2010-06-17

    申请号:US12334152

    申请日:2008-12-12

    CPC classification number: G06F17/3053 G06F17/30979

    Abstract: Methods and systems for estimate derivation are described. In one embodiment, a query may be received with a predicate for sets over a collection of items. Associated samples associated with the query may be accessed. Items of an associated sample may be accessed from the collection of items. A determination of whether the predicate is an attribute-based selection from a union of at least some sets may be made. Available items of the particular associated sample may be selected from the items. Identified items may be identified among the available items in the associated sample that satisfy the predicate. An adjusted weight may be assigned to an item based on a weight of the item and a distribution of the associated samples. An estimate may be generated based on the adjusted weight of the identified items of the associated samples that satisfy the predicate. Additional methods and systems are disclosed.

    Abstract translation: 描述了用于估计推导的方法和系统。 在一个实施例中,可以使用关于项集合的集合的谓词来接收查询。 可以访问与查询相关联的关联样本。 可以从项目集合中访问相关联样本的项目。 可以确定谓词是否是来自至少一些集合的联合的基于属性的选择。 可以从项目中选择特定关联样品的可用项目。 可以在满足谓词的关联样本中的可用项目之间识别所识别的项目。 可以基于项目的权重和相关联样本的分布将调整后的权重分配给项目。 可以基于满足谓词的相关联样本的所识别项目的调整权重来生成估计。 公开了附加的方法和系统。

    METHODS AND APPARATUS TO BOUND NETWORK TRAFFIC ESTIMATION ERROR FOR MULTISTAGE MEASUREMENT SAMPLING AND AGGREGATION
    9.
    发明申请
    METHODS AND APPARATUS TO BOUND NETWORK TRAFFIC ESTIMATION ERROR FOR MULTISTAGE MEASUREMENT SAMPLING AND AGGREGATION 失效
    方法和设备对多种测量采样和聚合的网络交通信息估计误差

    公开(公告)号:US20100150004A1

    公开(公告)日:2010-06-17

    申请号:US12335074

    申请日:2008-12-15

    CPC classification number: H04L43/16 H04L41/0681 H04L41/12 H04L43/02

    Abstract: Methods and apparatus to bound network traffic estimation error for multistage measurement sampling and aggregation are disclosed. An example method disclosed herein comprises determining a hierarchical sampling topology representative of multiple data sampling and aggregation stages, the hierarchical sampling topology comprising a plurality of nodes connected by a plurality of edges, each node corresponding to at least one of a data source and a data aggregation operation, and each edge corresponding to a data sampling operation characterized by a generalized sampling threshold, selecting a first generalized sampling threshold from a set of generalized sampling thresholds associated with a respective set of edges originating at a respective set of descendent nodes of a target node undergoing network traffic estimation, and transforming a measured sample of network traffic into a confidence interval for a network traffic estimate associated with the target node using the first generalized sampling threshold and an error parameter.

    Abstract translation: 公开了多级测量采样和聚合的绑定网络流量估计误差的方法和装置。 本文公开的示例性方法包括确定表示多个数据采样和聚合阶段的分层采样拓扑,所述分层采样拓扑包括由多个边缘连接的多个节点,每个节点对应于数据源和数据中的至少一个 并且每个边缘对应于由广义采样阈值表征的数据采样操作,从与源于目标的相应的一组后代节点的相应的一组边缘相关联的一组广义采样阈值中选择第一广义采样阈值 节点进行网络流量估计,并且使用第一广义采样阈值和误差参数将网络流量的测量样本变换为与目标节点相关联的网络流量估计的置信区间。

Patent Agency Ranking