Systems and methods for fast reachability queries in large graphs
    1.
    发明申请
    Systems and methods for fast reachability queries in large graphs 审中-公开
    大图中快速可达性查询的系统和方法

    公开(公告)号:US20060271304A1

    公开(公告)日:2006-11-30

    申请号:US11141548

    申请日:2005-05-31

    IPC分类号: G06F19/00 G06F7/00

    CPC分类号: G06F16/9027 G16B5/00

    摘要: A method which identifies different types of substructures within a graph and encodes them using techniques suitable to the characteristics of each of them. The method is embodied by an efficient two-phase algorithm, where the first phase identifies and encodes strongly connected components as well as tree substructures, and the second phase encodes the remaining reachability relationships by compressing dense rectangular submatrices in the transitive closure matrix.

    摘要翻译: 一种识别图形内不同类型的子结构的方法,并使用适合于它们每个特征的技术对它们进行编码。 该方法由有效的两相算法体现,其中第一阶段识别和编码强连接的分量以及树状子结构,第二阶段通过压缩传递闭包矩阵中的密集矩形子矩阵来编码剩余的可达性关系。

    Space and time efficient XML graph labeling
    2.
    发明申请
    Space and time efficient XML graph labeling 失效
    空间和时间有效的XML图形标注

    公开(公告)号:US20070230488A1

    公开(公告)日:2007-10-04

    申请号:US11396502

    申请日:2006-03-31

    IPC分类号: H04L12/56

    CPC分类号: H04L45/48 H04L45/02

    摘要: There is provided a method for determining reachability between any two nodes within a graph. The inventive method utilizes a dual-labeling scheme. Initially, a spanning tree is defined for a group of nodes within a graph. Each node in the spanning tree is assigned a unique interval-based label, that describes its dependency from an ancestor node. Non-tree labels are then assigned to each node in the spanning tree that is connected to another node in the spanning tree by a non-tree link. From these labels, reachability of any two nodes in the spanning tree is determined by using only the interval-based labels and the non-tree labels.

    摘要翻译: 提供了一种用于确定图中任何两个节点之间的可达性的方法。 本发明的方法利用双标记方案。 最初,为图中的一组节点定义了生成树。 生成树中的每个节点都被分配一个唯一的基于间隔的标签,它描述了从祖先节点的依赖关系。 然后,非树标签被分配给生成树中通过非树形链接连接到生成树中的另一个节点的每个节点。 从这些标签中,生成树中任何两个节点的可达性通过仅使用基于间隔的标签和非树标签来确定。

    System and method for ranked keyword search on graphs
    3.
    发明授权
    System and method for ranked keyword search on graphs 有权
    在图表上排名关键词搜索的系统和方法

    公开(公告)号:US07702620B2

    公开(公告)日:2010-04-20

    申请号:US11693471

    申请日:2007-03-29

    IPC分类号: G06F17/30

    摘要: Arrangements and methods for providing for the efficient implementation of ranked keyword searches on graph-structured data. Since it is difficult to directly build indexes for general schemaless graphs, conventional techniques highly rely on graph traversal in running time. The previous lack of more knowledge about graphs also resulted in great difficulties in applying pruning techniques. To address these problems, there is introduced herein a new scoring function while the block is used as an intermediate access level; the result is an opportunity to create sophisticated indexes for keyword search. Also proposed herein is a cost-balanced expansion algorithm to conduct a backward search, which provides a good theoretical guarantee in terms of the search cost.

    摘要翻译: 用于提供在图形结构化数据上有效执行排名关键词搜索的安排和方法。 由于难以直接构建一般无法图的索引,常规技术高度依赖于运行时间的图遍历。 以前缺乏对图形的更多了解也导致了应用修剪技术的巨大困难。 为了解决这些问题,这里引入了一个新的评分功能,而块被用作中间访问级别; 结果是为关键字搜索创建复杂索引的机会。 这里还提出了一种用于进行后向搜索的成本平衡的扩展算法,这在搜索成本方面提供了良好的理论保证。

    SYSTEM AND METHOD FOR RANKED KEYWORD SEARCH ON GRAPHS
    4.
    发明申请
    SYSTEM AND METHOD FOR RANKED KEYWORD SEARCH ON GRAPHS 有权
    排序关键字搜索的系统和方法

    公开(公告)号:US20080243811A1

    公开(公告)日:2008-10-02

    申请号:US11693471

    申请日:2007-03-29

    IPC分类号: G06F17/30

    摘要: Arrangements and methods for providing for the efficient implementation of ranked keyword searches on graph-structured data. Since it is difficult to directly build indexes for general schemaless graphs, conventional techniques highly rely on graph traversal in running time. The previous lack of more knowledge about graphs also resulted in great difficulties in applying pruning techniques. To address these problems, there is introduced herein a new scoring function while the block is used as an intermediate access level; the result is an opportunity to create sophisticated indexes for keyword search. Also proposed herein is a cost-balanced expansion algorithm to conduct a backward search, which provides a good theoretical guarantee in terms of the search cost.

    摘要翻译: 用于提供在图形结构化数据上有效执行排名关键词搜索的安排和方法。 由于难以直接构建一般无法图的索引,常规技术高度依赖于运行时间的图遍历。 以前缺乏对图形的更多了解也导致了应用修剪技术的巨大困难。 为了解决这些问题,这里引入了一个新的评分功能,而块被用作中间访问级别; 结果是为关键字搜索创建复杂索引的机会。 这里还提出了一种用于进行后向搜索的成本平衡的扩展算法,这在搜索成本方面提供了良好的理论保证。

    Space and time efficient XML graph labeling
    5.
    发明授权
    Space and time efficient XML graph labeling 失效
    空间和时间有效的XML图形标注

    公开(公告)号:US07492727B2

    公开(公告)日:2009-02-17

    申请号:US11396502

    申请日:2006-03-31

    IPC分类号: H04L12/28

    CPC分类号: H04L45/48 H04L45/02

    摘要: There is provided a method for determining reachability between any two nodes within a graph. The inventive method utilizes a dual-labeling scheme. Initially, a spanning tree is defined for a group of nodes within a graph. Each node in the spanning tree is assigned a unique interval-based label, that describes its dependency from an ancestor node. Non-tree labels are then assigned to each node in the spanning tree that is connected to another node in the spanning tree by a non-tree link. From these labels, reachability of any two nodes in the spanning tree is determined by using only the interval-based labels and the non-tree labels.

    摘要翻译: 提供了一种用于确定图中任何两个节点之间的可达性的方法。 本发明的方法利用双标记方案。 最初,为图中的一组节点定义了生成树。 生成树中的每个节点都被分配一个唯一的基于间隔的标签,它描述了从祖先节点的依赖关系。 然后,非树标签被分配给生成树中通过非树形链接连接到生成树中的另一个节点的每个节点。 从这些标签中,生成树中任何两个节点的可达性通过仅使用基于间隔的标签和非树标签来确定。

    System and method for indexing type-annotated web documents
    6.
    发明申请
    System and method for indexing type-annotated web documents 审中-公开
    用于索引类型注释的Web文档的系统和方法

    公开(公告)号:US20090049035A1

    公开(公告)日:2009-02-19

    申请号:US11891921

    申请日:2007-08-14

    IPC分类号: G06F7/06 G06F17/30

    CPC分类号: G06F16/951

    摘要: Methods and apparatus generate an index for use in a document retrieval system where the index is organized by type and keyword. Redundancy in the index is reduced by organizing type entries in a hierarchy of internal and leaf nodes. Determining whether to generate an inverted list for a type is based on the position of the type in the hierarchy; generally inverted lists are generated only for types corresponding to leaf nodes. Redundancy is further reduced by re-using inverted lists generated for keywords for types when there is an overlap between keywords and types. Search performance using the document retrieval index is improved by adding entries corresponding to combinations of keywords and types. The intersections of inverted lists associated with the keywords and types comprising the combinations are determined and added to the index for use in search operations. Determining whether to add an entry for a keyword-type combination is made on a cost-benefit analysis dependent, at least in part, on the proximity of the keyword to type in documents containing the combination.

    摘要翻译: 方法和设备生成用于文档检索系统的索引,其中索引按类型和关键字组织。 通过在内部和叶节点的层次结构中组织类型条目来减少索引中的冗余。 确定是否为类型生成反向列表是基于层次结构中类型的位置; 一般反转的列表仅针对对应于叶节点的类型生成。 当关键字和类型之间存在重叠时,通过重新使用针对关键字生成的反向列表来进一步减少冗余。 通过添加与关键字和类型的组合相对应的条目来提高使用文档检索索引的搜索性能。 确定与包括组合的关键词和类型相关联的倒排列表的交集并将其添加到用于搜索操作的索引中。 确定是否添加关键字类型组合的条目是根据成本效益分析进行的,至少部分是关键字的邻近度来键入包含该组合的文档。

    System and method for scalable processing of multi-way data stream correlations
    7.
    发明申请
    System and method for scalable processing of multi-way data stream correlations 失效
    用于多路数据流相关性的可扩展处理的系统和方法

    公开(公告)号:US20070288635A1

    公开(公告)日:2007-12-13

    申请号:US11417838

    申请日:2006-05-04

    IPC分类号: G06F15/173

    摘要: A computer implemented method, apparatus, and computer usable program code for processing multi-way stream correlations. Stream data are received for correlation. A task is formed for continuously partitioning a multi-way stream correlation workload into smaller workload pieces. Each of the smaller workload pieces may be processed by a single host. The stream data are sent to different hosts for correlation processing.

    摘要翻译: 一种用于处理多路流相关性的计算机实现的方法,装置和计算机可用程序代码。 接收流数据进行相关。 形成一个任务,用于将多路流相关工作负载连续划分成较小的工作负载。 每个较小的工作负载片段可以由单个主机处理。 流数据被发送到不同的主机进行相关处理。

    Systems and methods for sequential modeling in less than one sequential scan
    8.
    发明申请
    Systems and methods for sequential modeling in less than one sequential scan 失效
    在不到一次顺序扫描中进行顺序建模的系统和方法

    公开(公告)号:US20060026110A1

    公开(公告)日:2006-02-02

    申请号:US10903336

    申请日:2004-07-30

    IPC分类号: G06F15/18

    CPC分类号: G06N99/005 Y10S707/99931

    摘要: Most recent research of scalable inductive learning on very large streaming dataset focuses on eliminating memory constraints and reducing the number of sequential data scans. However, state-of-the-art algorithms still require multiple scans over the data set and use sophisticated control mechanisms and data structures. There is discussed herein a general inductive learning framework that scans the dataset exactly once. Then, there is proposed an extension based on Hoeffding's inequality that scans the dataset less than once. The proposed frameworks are applicable to a wide range of inductive learners.

    摘要翻译: 对最大流式数据集的可伸缩归纳学习的最新研究着重于消除记忆限制并减少顺序数据扫描的次数。 然而,最先进的算法仍然需要对数据集进行多次扫描,并使用复杂的控制机制和数据结构。 这里讨论了一般的归纳学习框架,该框架一次扫描数据集。 然后,提出了一种基于Hoeffding不等式的扩展,可以扫描数据集不止一次。 提出的框架适用于广泛的归纳学习者。

    Index Structure for Supporting Structural XML Queries
    9.
    发明申请
    Index Structure for Supporting Structural XML Queries 失效
    支持结构XML查询的索引结构

    公开(公告)号:US20070271243A1

    公开(公告)日:2007-11-22

    申请号:US11780095

    申请日:2007-07-19

    IPC分类号: G06F17/30

    摘要: The present invention provides a ViST (or “virtual suffix tree”), which is a novel index structure for searching XML documents. By representing both XML documents and XML queries in structure-encoded sequences, it is shown that querying XML data is equivalent to finding (non-contiguous) subsequence matches. A variety of XML queries, including those with branches, or wild-cards (‘*’ and ‘//’), can be expressed by structure-encoded sequences. Unlike index methods that disassemble a query into multiple sub-queries, and then join the results of these sub-queries to provide the final answers, ViST uses tree structures as the basic unit of query to avoid expensive join operations. Furthermore, ViST provides a unified index on both content and structure of the XML documents, hence it has a performance advantage over methods indexing either just content or structure. ViST supports dynamic index update, and it relies solely on B+Trees without using any specialized data structures that are not well supported by common database management systems (hereinafter referred to as “DBMSs”).

    摘要翻译: 本发明提供了一种ViST(或“虚拟后缀树”),其是用于搜索XML文档的新型索引结构。 通过在结构编码序列中同时表示XML文档和XML查询,显示查询XML数据等同于查找(非连续)子序列匹配。 各种XML查询(包括具有分支的查询)或通配符('*'和'//')可以由结构编码的序列表示。 不同于将查询反汇编成多个子查询的索引方法,然后加入这些子查询的结果以提供最终答案,ViST使用树结构作为查询的基本单位,以避免昂贵的连接操作。 此外,ViST为XML文档的内容和结构提供了一个统一的索引,因此与仅通过内容或结构索引方法相比,它具有性能优势。 ViST支持动态索引更新,它仅仅依赖于B< +>树,而不使用通用数据库管理系统(以下简称“DBMS”)不能很好支持的任何专门的数据结构。

    System and method for sequencing XML documents for tree structure indexing
    10.
    发明申请
    System and method for sequencing XML documents for tree structure indexing 失效
    用于对树结构索引的XML文档进行排序的系统和方法

    公开(公告)号:US20060161575A1

    公开(公告)日:2006-07-20

    申请号:US11035889

    申请日:2005-01-14

    IPC分类号: G06F7/00

    摘要: Sequence-based XML indexing aims at avoiding expensive join operations in query processing. It transforms structured XML data into sequences so that a structured query can be answered holistically through subsequence matching. Herein, there is addresed the problem of query equivalence with respect to this transformation, and thereis introduced a performance-oriented principle for sequencing tree structures. With query equivalence, XML queries can be performed through subsequence matching without join operations, post-processing, or other special handling for problems such as false alarms. There is identified a class of sequencing methods for this purpose, and there is presented a novel subsequence matching algorithm that observe query equivalence. Also introduced is a performance-oriented principle to guide the sequencing of tree structures. For any given XML dataset, the principle finds an optimal sequencing strategy according to its schema and its data distribution; there is thus presented herein a novel method that realizes this principle.

    摘要翻译: 基于序列的XML索引旨在避免查询处理中的昂贵的联接操作。 它将结构化XML数据转换为序列,以便可以通过子序列匹配整体回答结构化查询。 这里提出了相对于这种转换的查询等价性的问题,并且引入了用于排序树结构的性能导向原理。 通过查询等价,可以通过子序列匹配执行XML查询,无需连接操作,后处理或其他特殊处理,例如虚假警报等问题。 确定了一类用于此目的的测序方法,并提出了一种观察查询等价性的新颖的子序列匹配算法。 还引入了一种以性能为导向的原则来指导树结构的排序。 对于任何给定的XML数据集,该原理根据其模式及其数据分布找到最佳排序策略; 因此在此呈现了实现这一原理的新颖方法。