Summary-based routing for content-based event distribution networks
    1.
    发明申请
    Summary-based routing for content-based event distribution networks 审中-公开
    基于内容的事件分发网络的基于摘要的路由

    公开(公告)号:US20070168550A1

    公开(公告)日:2007-07-19

    申请号:US11702856

    申请日:2007-02-06

    IPC分类号: G06F15/173

    摘要: A system arid method for enabling highly scalable multi-node event distribution networks through the use of summary-based routing, particularly event distribution networks using a content-based publish/subscribe model to distribute information. By allowing event routers to use imprecise summaries of the subscriptions hosted by matcher nodes, an event router can eliminate itself as a bottleneck thus improving overall event distribution network throughput even though the use of imprecise summaries results in some false positive event traffic. False positive event traffic is reduced by using a filter set partitioning that provides for good subscription set locality at each matcher node, while at the same time avoiding overloading any one matcher node. Good subscription set locality is maintained by routing new subscriptions to a matcher node with a subscription summary that best covers the new subscription. Where event space partitioning is desirable, an over-partitioning scheme is described that enables load balancing without repartitioning.

    摘要翻译: 一种用于通过使用基于摘要的路由,特别是使用基于内容的发布/订阅模型来分发信息的事件分发网络来实现高度可扩展的多节点事件分发网络的系统和方法。 通过允许事件路由器使用由匹配器节点托管的订阅的不精确的摘要,事件路由器可以将自身消除为瓶颈,从而改善整体事件分发网络吞吐量,即使使用不精确的摘要导致一些假阳性事件流量。 通过使用在每个匹配器节点处提供良好订阅集位置的过滤器集分割来减少假正事件流量,同时避免任何一个匹配器节点的过载。 通过将新的订阅路由到具有最佳覆盖新订阅的订阅摘要的匹配器节点来维护良好的订阅集位置。 在需要事件空间分区的情况下,描述了能够进行负载均衡而不进行重新分区的过分配方案。

    Summary-based routing for content-based event distribution networks
    2.
    发明授权
    Summary-based routing for content-based event distribution networks 有权
    基于内容的事件分发网络的基于摘要的路由

    公开(公告)号:US07200675B2

    公开(公告)日:2007-04-03

    申请号:US10389623

    申请日:2003-03-14

    IPC分类号: G06F15/173 G06F15/16

    摘要: A system and method for enabling highly scalable multi-node event distribution networks through the use of summary-based routing, particularly event distribution networks using a content-based publish/subscribe model to distribute information. By allowing event routers to use imprecise summaries of the subscriptions hosted by matcher nodes, an event router can eliminate itself as a bottleneck thus improving overall event distribution network throughput even though the use of imprecise summaries results in some false positive event traffic. False positive event traffic is reduced by using a filter set partitioning that provides for good subscription set locality at each matcher node, while at the same time avoiding overloading any one matcher node. Good subscription set locality is maintained by routing new subscriptions to a matcher node with a subscription summary that best covers the new subscription. Where event space partitioning is desirable, an over-partitioning scheme is described that enables load balancing without repartitioning.

    摘要翻译: 一种通过使用基于摘要的路由,特别是使用基于内容的发布/订阅模型来分发信息的事件分发网络来实现高度可扩展的多节点事件分发网络的系统和方法。 通过允许事件路由器使用由匹配器节点托管的订阅的不精确的摘要,事件路由器可以将自身消除为瓶颈,从而改善整体事件分发网络吞吐量,即使使用不精确的摘要导致一些假阳性事件流量。 通过使用在每个匹配器节点处提供良好订阅集位置的过滤器集分割来减少假正事件流量,同时避免任何一个匹配器节点的过载。 通过将新的订阅路由到具有最佳覆盖新订阅的订阅摘要的匹配器节点来维护良好的订阅集位置。 在需要事件空间分区的情况下,描述了能够进行负载均衡而不进行重新分区的过分配方案。

    Just-in-time analytics on large file systems
    4.
    发明授权
    Just-in-time analytics on large file systems 有权
    大型文件系统的即时分析

    公开(公告)号:US09244975B2

    公开(公告)日:2016-01-26

    申请号:US13328810

    申请日:2011-12-16

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30442

    摘要: As file systems reach the petabytes scale, users and administrators are increasingly interested in acquiring high-level analytical information for file management and analysis. Two particularly important tasks are the processing of aggregate and top-k queries which, unfortunately, cannot be quickly answered by hierarchical file systems such as ext3 and NTFS. Existing pre-processing based solutions, e.g., file system crawling and index building, consume a significant amount of time and space (for generating and maintaining the indexes) which in many cases cannot be justified by the infrequent usage of such solutions. User interests can often be sufficiently satisfied by approximate (i.e., statistically accurate) answers. A just-in-time sampling-based system can, after consuming a small number of disk accesses, produce extremely accurate answers for a broad class of aggregate and top-k queries over a file system without the requirement of any prior knowledge. The system is efficient, accurate and scalable.

    摘要翻译: 随着文件系统达到PB级,用户和管理员越来越有兴趣获取用于文件管理和分析的高级分析信息。 两个特别重要的任务是处理聚合和top-k查询,不幸的是不能通过分层文件系统(如ext3和NTFS)快速回答。 现有的基于预处理的解决方案,例如文件系统爬行和索引构建,消耗了大量的时间和空间(用于生成和维护索引),这在许多情况下不能被这种解决方案的频繁使用所证明。 用户兴趣通常可以通过近似(即统计准确的)答案来充分满足。 基于时间抽样的系统可以在消耗少量磁盘访问后,通过文件系统为广泛的聚合和顶级查询提供非常准确的答案,而无需任何先前的知识。 该系统是高效,准确和可扩展的。

    SUPPORTING UNIFIED QUERYING OVER AUTONOMOUS UNSTRUCTURED AND STRUCTURED DATABASES
    5.
    发明申请
    SUPPORTING UNIFIED QUERYING OVER AUTONOMOUS UNSTRUCTURED AND STRUCTURED DATABASES 有权
    支持统一的自动查询和结构化数据库

    公开(公告)号:US20090248619A1

    公开(公告)日:2009-10-01

    申请号:US12059350

    申请日:2008-03-31

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30545

    摘要: Methods, systems and computer products perform cost estimate to determine an efficient approach to answer a query according to one of several unified query plans. One unified query plan involves querying an unstructured database, referencing a unified index, and probing a structured database based on matches discovered in the unified index. The results of the unstructured database query are used to lookup entries in a unified index associated with the structured database. Then the structured database is probed by querying only the subset of the structured database gleaned from the unstructured database query.

    摘要翻译: 方法,系统和计算机产品执行成本估算,以确定根据多个统一查询计划之一回答查询的有效方法。 一个统一的查询计划涉及查询非结构化数据库,引用统一索引,并根据统一索引中发现的匹配来探测结构化数据库。 非结构化数据库查询的结果用于查找与结构化数据库相关联的统一索引中的条目。 然后通过仅查询从非结构化数据库查询中收集的结构化数据库的子集来探测结构化数据库。

    JUST-IN-TIME ANALYTICS ON LARGE FILE SYSTEMS
    8.
    发明申请
    JUST-IN-TIME ANALYTICS ON LARGE FILE SYSTEMS 有权
    大型文件系统的即时分析

    公开(公告)号:US20120166478A1

    公开(公告)日:2012-06-28

    申请号:US13328810

    申请日:2011-12-16

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30442

    摘要: As file systems reach the petabytes scale, users and administrators are increasingly interested in acquiring high-level analytical information for file management and analysis. Two particularly important tasks are the processing of aggregate and top-k queries which, unfortunately, cannot be quickly answered by hierarchical file systems such as ext3 and NTFS. Existing pre-processing based solutions, e.g., file system crawling and index building, consume a significant amount of time and space (for generating and maintaining the indexes) which in many cases cannot be justified by the infrequent usage of such solutions. User interests can often be sufficiently satisfied by approximate (i.e., statistically accurate) answers. A just-in-time sampling-based system can, after consuming a small number of disk accesses, produce extremely accurate answers for a broad class of aggregate and top-k queries over a file system without the requirement of any prior knowledge. The system is efficient, accurate and scalable.

    摘要翻译: 随着文件系统达到PB级,用户和管理员越来越有兴趣获取用于文件管理和分析的高级分析信息。 两个特别重要的任务是处理聚合和top-k查询,不幸的是不能通过分层文件系统(如ext3和NTFS)快速回答。 现有的基于预处理的解决方案,例如文件系统爬行和索引构建,消耗了大量的时间和空间(用于生成和维护索引),这在许多情况下不能被这种解决方案的频繁使用所证明。 用户兴趣通常可以通过近似(即统计准确的)答案来充分满足。 基于时间抽样的系统可以在消耗少量磁盘访问后,通过文件系统为广泛的聚合和顶级查询提供非常准确的答案,而无需任何先前的知识。 该系统是高效,准确和可扩展的。

    Just-in-time analytics on large file systems and hidden databases
    9.
    发明授权
    Just-in-time analytics on large file systems and hidden databases 有权
    大型文件系统和隐藏数据库的即时分析

    公开(公告)号:US09244976B1

    公开(公告)日:2016-01-26

    申请号:US13402764

    申请日:2012-02-22

    申请人: Nan Zhang Gautam Das

    发明人: Nan Zhang Gautam Das

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30442

    摘要: A just-in-time sampling-based system can, after consuming a small number of disk accesses or queries, produce extremely accurate answers for a broad class of aggregate and top-k queries over a file system or database without the requirement of prior knowledge. The system is efficient, accurate, and scalable. The system performs aggregate estimations of a hidden database through its web interface by employing techniques that use a small number of queries to produce unbiased estimates with small variance. It conducts domain discovery over a hidden database through its web interface by employing techniques which provide effective guarantees on the effectiveness of domain discovery. Systems and methods enhance forms used by mobile devices to access hidden databases. It employs data analytics to improve the usage of form fields, including providing context-sensitive auto-completion suggestions, highlighting selections in drop-down boxes and eliminating suggestions in drop-down boxes.

    摘要翻译: 基于时间抽样的系统可以在消耗少量磁盘访问或查询后,通过文件系统或数据库为广泛类型的聚合和顶级查询提供非常准确的答案,而无需事先知识 。 该系统是高效,准确和可扩展的。 系统通过使用少量查询的技术来生成具有小差异的无偏估计,通过其Web界面执行隐藏数据库的总体估计。 它通过其Web界面通过隐藏的数据库进行域发现,采用技术为域发现的有效性提供有效的保证。 系统和方法增强了移动设备访问隐藏数据库所使用的形式。 它采用数据分析来改进表单域的使用,包括提供上下文相关的自动完成建议,在下拉框中突出显示选择,并在下拉框中删除建议。

    Ranking database query results
    10.
    发明申请
    Ranking database query results 失效
    排名数据库查询结果

    公开(公告)号:US20050289102A1

    公开(公告)日:2005-12-29

    申请号:US10879450

    申请日:2004-06-29

    IPC分类号: G06F7/00 G06Q30/00 G06Q50/00

    摘要: A system and methods rank results of database queries. An automated approach for ranking database query results is disclosed that leverages data and workload statistics and associations. Ranking functions are based upon the principles of probabilistic models from Information Retrieval that are adapted for structured data. The ranking functions are encoded into an intermediate knowledge representation layer. The system is generic, as the ranking functions can be further customized for different applications. Benefits of the disclosed system and methods include the use of adapted probabilistic information retrieval (PIR) techniques that leverage relational/structured data, such as columns, to provide natural groupings of data values. This permits the inference and use of pair-wise associations between data values across columns, which are usually not possible with text data.

    摘要翻译: 系统和方法对数据库查询的结果进行排序。 披露了一种用于排名数据库查询结果的自动化方法,它利用数据和工作量统计信息和关联。 排名函数基于适用于结构化数据的信息检索的概率模型的原理。 排序函数被编码为中间知识表示层。 该系统是通用的,因为排序功能可以针对不同的应用进一步定制。 所公开的系统和方法的优点包括使用适应的概率信息检索(PIR)技术来利用诸如列的关系/结构化数据来提供数据值的自然分组。 这允许推断和使用跨列之间的数据值之间的成对关联,这通常不可能与文本数据。