Incremental maintenance of inverted indexes for approximate string matching
    31.
    发明授权
    Incremental maintenance of inverted indexes for approximate string matching 失效
    用于近似字符串匹配的反向索引的增量维护

    公开(公告)号:US08271499B2

    公开(公告)日:2012-09-18

    申请号:US12481693

    申请日:2009-06-10

    IPC分类号: G06F7/00

    摘要: In embodiments of the disclosed technology, indexes, such as inverted indexes, are updated only as necessary to guarantee answer precision within predefined thresholds which are determined with little cost in comparison to the updates of the indexes themselves. With the present technology, a batch of daily updates can be processed in a matter of minutes, rather than a few hours for rebuilding an index, and a query may be answered with assurances that the results are accurate or within a threshold of accuracy.

    摘要翻译: 在所公开的技术的实施例中,诸如反向索引之类的索引仅在必要时被更新以保证在与索引本身的更新相比较较少成本的预定阈值内的应答精度。 使用本技术,可以在几分钟内处理一批每日更新,而不是几个小时来重建索引,并且可以回答保证结果准确或准确的阈值。

    Conservation dependencies
    32.
    发明申请
    Conservation dependencies 有权
    保护依赖

    公开(公告)号:US20120130935A1

    公开(公告)日:2012-05-24

    申请号:US12927770

    申请日:2010-11-23

    IPC分类号: G06N5/02

    CPC分类号: G06Q40/025 G06N5/02

    摘要: Given a set of data for which a conservation law is an appropriate characterization, “hold” and/or “fail” tableaux are provided for the underlying conservation law, thereby providing a conservation dependency whereby portions of the data for which the law approximately holds or fails can be discovered and summarized in a semantically meaningful way.

    摘要翻译: 给定一套保护法是适当表征的数据,为基础守恒定律提供“保持”和/或“失败”表,从而提供保护依赖性,依赖于法律约束的数据部分或 可以以语义有意义的方式发现和总结失败。

    Verification of outsourced data streams
    33.
    发明授权
    Verification of outsourced data streams 有权
    验证外包数据流

    公开(公告)号:US08112802B2

    公开(公告)日:2012-02-07

    申请号:US12275879

    申请日:2008-11-21

    IPC分类号: G06F11/00

    摘要: Embodiments disclosed herein are directed to verifying query results of an untrusted server. A data owner outsources a data stream to the untrusted server, which is configured to respond to a query from a client with the query result, which is returned to the client. The data owner can maintain a vector associated with query results returned by the server and can generate a verification synopsis using the vector and a seed. The verification synopsis includes a polynomial, where coefficients of the polynomial are determined based on the seed. The data owner outputs the verification synopsis and the seed to a client for verification of the query results.

    摘要翻译: 本文公开的实施例旨在验证不可信服务器的查询结果。 数据所有者将数据流外包给不受信任的服务器,该服务器被配置为响应来自具有查询结果的客户端的查询,该查询返回给客户端。 数据所有者可以维护与服务器返回的查询结果相关联的向量,并可以使用向量和种子生成验证概要。 验证概要包括多项式,其中基于种子确定多项式的系数。 数据所有者将验证概要和种子输出到客户端以验证查询结果。

    Multicast with adaptive dual-state
    34.
    发明授权
    Multicast with adaptive dual-state 有权
    具有自适应双状态的组播

    公开(公告)号:US08064446B2

    公开(公告)日:2011-11-22

    申请号:US12060723

    申请日:2008-04-01

    IPC分类号: H04L12/28 H04J3/26

    摘要: A method and system are described to multicast with an adaptive dual state. The system receives multicast traffic over a membership tree including a first plurality of nodes connected in a first topology destined for a plurality of multicast members of a first multicast group. Next, the system determines a rate of multicast traffic that exceeds a predetermined threshold based on the receiving the multicast traffic. Next, the system generates a dissemination tree including a second plurality of nodes connected in a second topology to reduce a number of hops to communicate the multicast traffic to the plurality of multicast members of the first multicast group. Finally, the system forwards the multicast traffic to the plurality of multicast members of the first multicast group over the dissemination tree.

    摘要翻译: 一种方法和系统被描述为具有自适应双状态的多播。 系统通过隶属树接收组播流量,该成员树包括连接在第一个拓扑中的第一个多个节点,目的地是第一组播组的多个组播成员。 接下来,系统基于接收到多播业务来确定超过预定阈值的多播业务速率。 接下来,系统生成包括在第二拓扑中连接的第二多个节点的传播树,以减少将多播流量传送到第一多播组的多个多播成员的跳数。 最后,系统通过传播树将组播流量转发给第一组播组的多个组播成员。

    Set similarity selection queries at interactive speeds
    35.
    发明授权
    Set similarity selection queries at interactive speeds 有权
    以交互式速度设置相似性选择查询

    公开(公告)号:US07921100B2

    公开(公告)日:2011-04-05

    申请号:US12006332

    申请日:2008-01-02

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30442

    摘要: The similarity between a query set comprising query set tokens and a database set comprising database set tokens is determined by a similarity score. The database sets belong to a data collection set, which contains all database sets from which information may be retrieved. If the similarity score is greater than or equal to a user-defined threshold, the database set has information relevant to the query set. The similarity score is calculated with an inverse document frequency method (IDF) similarity measure independent of term frequency. The document frequency is based at least in part on the number of database sets in the data collection set and the number of database sets which contain at least one query set token. The length of the query set and the length of the database set are normalized.

    摘要翻译: 包括查询集令牌的查询集和包括数据库集令牌的数据库集之间的相似性由相似性得分确定。 数据库集合属于数据集合集,其中包含可从中检索信息的所有数据库集。 如果相似性得分大于或等于用户定义的阈值,则数据库集合具有与查询集相关的信息。 相似性得分用独立于术语频率的逆文档频率法(IDF)相似性度量计算。 文档频率至少部分地基于数据收集集中的数据库集合的数量以及包含至少一个查询集令牌的数据库集合的数量。 查询集的长度和数据库集的长度被归一化。

    User-Powered Recommendation System
    37.
    发明申请
    User-Powered Recommendation System 有权
    用户推荐系统

    公开(公告)号:US20100138443A1

    公开(公告)日:2010-06-03

    申请号:US12616892

    申请日:2009-11-12

    IPC分类号: G06F17/30

    摘要: Recommendation systems are widely used in Internet applications. In current recommendation systems, users only play a passive role and have limited control over the recommendation generation process. As a result, there is often considerable mismatch between the recommendations made by these systems and the actual user interests, which are fine-grained and constantly evolving. With a user-powered distributed recommendation architecture, individual users can flexibly define fine-grained communities of interest in a declarative fashion and obtain recommendations accurately tailored to their interests by aggregating opinions of users in such communities. By combining a progressive sampling technique with data perturbation methods, the recommendation system is both scalable and privacy-preserving.

    摘要翻译: 推荐系统广泛应用于互联网应用。 在目前的推荐系统中,用户只能发挥被动的作用,对推荐生成过程的控制有限。 因此,这些系统提出的建议和实际用户兴趣之间经常存在很大的不匹配,这些建议是细粒度和不断发展的。 通过用户分配的推荐体系结构,个人用户可以灵活地定义精细的社区,并以声明方式定义感兴趣的社区,通过汇总用户在这些社区的意见,获得准确定制的兴趣建议。 通过将逐行采样技术与数据扰动方法相结合,推荐系统既可扩展又保密。

    Method and systems for content access and distribution
    38.
    发明授权
    Method and systems for content access and distribution 有权
    内容访问和分发的方法和系统

    公开(公告)号:US07623534B1

    公开(公告)日:2009-11-24

    申请号:US11322828

    申请日:2005-12-30

    IPC分类号: H04L12/56

    摘要: Distribution of content between publishers and consumers is accomplished using an overlay network that may make use of XML language to facilitate content identification. The overlay network includes a plurality of routers that may be in communication with each other and the publishers and consumers on the Internet. Content and queries are identified by content descriptors that are routed from the originator to a nearest router in the overlay network. The nearest router, for each unique content descriptor, generates a hash identification of the content descriptor which is used by remaining routers in the overlay network to provide the appropriate functions with respect to the content descriptor. In particular, this allows all routers in the overlay network except the nearest router to properly route content without processing every content descriptor.

    摘要翻译: 发布商和消费者之间的内容分发是通过覆盖网络实现的,该网络可以利用XML语言来促进内容识别。 覆盖网络包括可以彼此通信的多个路由器以及因特网上的发布者和消费者。 内容和查询由从发起者路由到覆盖网络中最近的路由器的内容描述符来标识。 对于每个唯一的内容描述符,最近的路由器生成内容描述符的散列标识符,该标识由覆盖网络中的剩余路由器使用以提供关于内容描述符的适当的功能。 特别地,这允许除了最近的路由器之外的覆盖网络中的所有路由器正确路由内容而不处理每个内容描述符。

    Generating conditional functional dependencies
    39.
    发明申请
    Generating conditional functional dependencies 失效
    生成条件函数依赖

    公开(公告)号:US20090287721A1

    公开(公告)日:2009-11-19

    申请号:US12380858

    申请日:2009-03-03

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30604

    摘要: Techniques are disclosed for generating conditional functional dependency (CFD) pattern tableaux having the desirable properties of support, confidence and parsimony. These techniques include both a greedy algorithm for generating a tableau and, for large data sets, an “on-demand” algorithm that outperforms the basic greedy algorithm in running time by an order of magnitude. In addition, a range tableau, as a generalization of a pattern tableau, can achieve even more parsimony.

    摘要翻译: 公开了用于产生具有所需性质的支持,置信和简约的条件功能依赖(CFD)模式表的技术。 这些技术既包括用于生成表格的贪心算法,又包括用于大数据集的“按需”算法,其在运行时间上超过基本贪心算法一个数量级。 另外,作为模式表格的泛化,范围表可以实现更加简单。