System and method for ranked keyword search on graphs
    1.
    发明授权
    System and method for ranked keyword search on graphs 有权
    在图表上排名关键词搜索的系统和方法

    公开(公告)号:US07702620B2

    公开(公告)日:2010-04-20

    申请号:US11693471

    申请日:2007-03-29

    IPC分类号: G06F17/30

    摘要: Arrangements and methods for providing for the efficient implementation of ranked keyword searches on graph-structured data. Since it is difficult to directly build indexes for general schemaless graphs, conventional techniques highly rely on graph traversal in running time. The previous lack of more knowledge about graphs also resulted in great difficulties in applying pruning techniques. To address these problems, there is introduced herein a new scoring function while the block is used as an intermediate access level; the result is an opportunity to create sophisticated indexes for keyword search. Also proposed herein is a cost-balanced expansion algorithm to conduct a backward search, which provides a good theoretical guarantee in terms of the search cost.

    摘要翻译: 用于提供在图形结构化数据上有效执行排名关键词搜索的安排和方法。 由于难以直接构建一般无法图的索引,常规技术高度依赖于运行时间的图遍历。 以前缺乏对图形的更多了解也导致了应用修剪技术的巨大困难。 为了解决这些问题,这里引入了一个新的评分功能,而块被用作中间访问级别; 结果是为关键字搜索创建复杂索引的机会。 这里还提出了一种用于进行后向搜索的成本平衡的扩展算法,这在搜索成本方面提供了良好的理论保证。

    SYSTEM AND METHOD FOR RANKED KEYWORD SEARCH ON GRAPHS
    2.
    发明申请
    SYSTEM AND METHOD FOR RANKED KEYWORD SEARCH ON GRAPHS 有权
    排序关键字搜索的系统和方法

    公开(公告)号:US20080243811A1

    公开(公告)日:2008-10-02

    申请号:US11693471

    申请日:2007-03-29

    IPC分类号: G06F17/30

    摘要: Arrangements and methods for providing for the efficient implementation of ranked keyword searches on graph-structured data. Since it is difficult to directly build indexes for general schemaless graphs, conventional techniques highly rely on graph traversal in running time. The previous lack of more knowledge about graphs also resulted in great difficulties in applying pruning techniques. To address these problems, there is introduced herein a new scoring function while the block is used as an intermediate access level; the result is an opportunity to create sophisticated indexes for keyword search. Also proposed herein is a cost-balanced expansion algorithm to conduct a backward search, which provides a good theoretical guarantee in terms of the search cost.

    摘要翻译: 用于提供在图形结构化数据上有效执行排名关键词搜索的安排和方法。 由于难以直接构建一般无法图的索引,常规技术高度依赖于运行时间的图遍历。 以前缺乏对图形的更多了解也导致了应用修剪技术的巨大困难。 为了解决这些问题,这里引入了一个新的评分功能,而块被用作中间访问级别; 结果是为关键字搜索创建复杂索引的机会。 这里还提出了一种用于进行后向搜索的成本平衡的扩展算法,这在搜索成本方面提供了良好的理论保证。

    Space and time efficient XML graph labeling
    3.
    发明授权
    Space and time efficient XML graph labeling 失效
    空间和时间有效的XML图形标注

    公开(公告)号:US07492727B2

    公开(公告)日:2009-02-17

    申请号:US11396502

    申请日:2006-03-31

    IPC分类号: H04L12/28

    CPC分类号: H04L45/48 H04L45/02

    摘要: There is provided a method for determining reachability between any two nodes within a graph. The inventive method utilizes a dual-labeling scheme. Initially, a spanning tree is defined for a group of nodes within a graph. Each node in the spanning tree is assigned a unique interval-based label, that describes its dependency from an ancestor node. Non-tree labels are then assigned to each node in the spanning tree that is connected to another node in the spanning tree by a non-tree link. From these labels, reachability of any two nodes in the spanning tree is determined by using only the interval-based labels and the non-tree labels.

    摘要翻译: 提供了一种用于确定图中任何两个节点之间的可达性的方法。 本发明的方法利用双标记方案。 最初,为图中的一组节点定义了生成树。 生成树中的每个节点都被分配一个唯一的基于间隔的标签,它描述了从祖先节点的依赖关系。 然后,非树标签被分配给生成树中通过非树形链接连接到生成树中的另一个节点的每个节点。 从这些标签中,生成树中任何两个节点的可达性通过仅使用基于间隔的标签和非树标签来确定。

    System and method for load shedding in data mining and knowledge discovery from stream data
    4.
    发明授权
    System and method for load shedding in data mining and knowledge discovery from stream data 有权
    数据挖掘中的负载脱落和流数据的知识发现的系统和方法

    公开(公告)号:US08060461B2

    公开(公告)日:2011-11-15

    申请号:US12372568

    申请日:2009-02-17

    IPC分类号: G06F7/00 G06F17/00

    CPC分类号: G06K9/6297 H04L43/028

    摘要: Load shedding schemes for mining data streams. A scoring function is used to rank the importance of stream elements, and those elements with high importance are investigated. In the context of not knowing the exact feature values of a data stream, the use of a Markov model is proposed herein for predicting the feature distribution of a data stream. Based on the predicted feature distribution, one can make classification decisions to maximize the expected benefits. In addition, there is proposed herein the employment of a quality of decision (QoD) metric to measure the level of uncertainty in decisions and to guide load shedding. A load shedding scheme such as presented herein assigns available resources to multiple data streams to maximize the quality of classification decisions. Furthermore, such a load shedding scheme is able to learn and adapt to changing data characteristics in the data streams.

    摘要翻译: 挖掘数据流的加载脱落方案。 使用评分函数对流元素的重要性进行排序,并调查那些具有重要意义的元素。 在不知道数据流的精确特征值的上下文中,本文提出了使用马尔可夫模型来预测数据流的特征分布。 基于预测的特征分布,可以进行分类决定,以最大限度地提高预期效益。 此外,在此提出采用质量决策(QoD)度量来衡量决策中的不确定性水平并指导负荷脱落。 诸如此处呈现的负载脱落方案将可用资源分配给多个数据流以最大化分类决定的质量。 此外,这种负载脱落方案能够学习和适应数据流中不断变化的数据特性。

    System and method for scalable cost-sensitive learning
    5.
    发明授权
    System and method for scalable cost-sensitive learning 有权
    可扩展成本敏感学习的系统和方法

    公开(公告)号:US07904397B2

    公开(公告)日:2011-03-08

    申请号:US12690502

    申请日:2010-01-20

    IPC分类号: G06F15/18 G06N3/00 G06N3/12

    CPC分类号: G06N99/005

    摘要: A method (and structure) for processing an inductive learning model for a dataset of examples, includes dividing the dataset of examples into a plurality of subsets of data and generating, using a processor on a computer, a learning model using examples of a first subset of data of the plurality of subsets of data. The learning model being generated for the first subset comprises an initial stage of an evolving aggregate learning model (ensemble model) for an entirety of the dataset, the ensemble model thereby providing an evolving estimated learning model for the entirety of the dataset if all the subsets were to be processed. The generating of the learning model using data from a subset includes calculating a value for at least one parameter that provides an objective indication of an adequacy of a current stage of the ensemble model.

    摘要翻译: 一种用于处理实例的数据集的感应学习模型的方法(和结构),包括将示例的数据集划分成多个数据子集,并使用计算机上的处理器生成使用第一子集的示例的学习模型 的多个数据子集的数据。 为第一子集生成的学习模型包括用于整个数据集的演进聚合学习模型(集合模型)的初始阶段,从而为整个数据集提供演进的估计学习模型,如果所有子集 被处理。 使用来自子集的数据生成学习模型包括计算至少一个参数的值,所述参数提供对所述集合模型的当前阶段的充分性的客观指示。

    System and method for sequence-based subspace pattern clustering
    6.
    发明授权
    System and method for sequence-based subspace pattern clustering 失效
    基于序列的子空间模式聚类的系统和方法

    公开(公告)号:US07565346B2

    公开(公告)日:2009-07-21

    申请号:US10858541

    申请日:2004-05-31

    IPC分类号: G06F17/30

    CPC分类号: G06K9/6215 Y10S707/99936

    摘要: Unlike traditional clustering methods that focus on grouping objects with similar values on a set of dimensions, clustering by pattern similarity finds objects that exhibit a coherent pattern of rise and fall in subspaces. Pattern-based clustering extends the concept of traditional clustering and benefits a wide range of applications, including e-Commerce target marketing, bioinformatics (large scale scientific data analysis), and automatic computing (web usage analysis), etc. However, state-of-the-art pattern-based clustering methods (e.g., the pCluster algorithm) can only handle datasets of thousands of records, which makes them inappropriate for many real-life applications. Furthermore, besides the huge data volume, many data sets are also characterized by their sequentiality, for instance, customer purchase records and network event logs are usually modeled as data sequences. Hence, it becomes important to enable pattern-based clustering methods i) to handle large datasets, and ii) to discover pattern similarity embedded in data sequences. There is presented herein a novel method that offers this capability.

    摘要翻译: 与传统的集群方法不同,传统的集群方法集中在对一组维度上具有类似值的对象进行分组,通过模式相似性进行聚类可以找到在子空间中呈现一致的上升和下降模式的对象。 基于模式的群集扩展了传统群集的概念,受益于广泛的应用,包括电子商务目标营销,生物信息学(大规模科学数据分析)和自动计算(Web使用分析)等。然而,状态 基于图案的聚类方法(例如,pCluster算法)只能处理数千条记录的数据集,这使得它们不适合许多现实生活中的应用。 此外,除了巨大的数据量之外,许多数据集的特征还在于它们的顺序性,例如,客户购买记录和网络事件日志通常被建模为数据序列。 因此,重要的是启用基于图案的聚类方法i)处理大数据集,以及ii)发现嵌入在数据序列中的模式相似性。 这里提供了一种提供这种能力的新颖方法。

    System and method for load shedding in data mining and knowledge discovery from stream data
    7.
    发明授权
    System and method for load shedding in data mining and knowledge discovery from stream data 有权
    数据挖掘中的负载脱落和流数据的知识发现的系统和方法

    公开(公告)号:US07493346B2

    公开(公告)日:2009-02-17

    申请号:US11058944

    申请日:2005-02-16

    IPC分类号: G06F12/00 G06F17/30 G06F9/46

    CPC分类号: G06K9/6297 H04L43/028

    摘要: Load shedding schemes for mining data streams. A scoring function is used to rank the importance of stream elements, and those elements with high importance are investigated. In the context of not knowing the exact feature values of a data stream, the use of a Markov model is proposed herein for predicting the feature distribution of a data stream. Based on the predicted feature distribution, one can make classification decisions to maximize the expected benefits. In addition, there is proposed herein the employment of a quality of decision (QoD) metric to measure the level of uncertainty in decisions and to guide load shedding. A load shedding scheme such as presented herein assigns available resources to multiple data streams to maximize the quality of classification decisions. Furthermore, such a load shedding scheme is able to learn and adapt to changing data characteristics in the data streams.

    摘要翻译: 挖掘数据流的加载脱落方案。 使用评分函数对流元素的重要性进行排序,并调查那些具有重要意义的元素。 在不知道数据流的精确特征值的上下文中,本文提出了使用马尔可夫模型来预测数据流的特征分布。 基于预测的特征分布,可以进行分类决定,以最大限度地提高预期效益。 此外,在此提出采用质量决策(QoD)度量来衡量决策中的不确定性水平并指导负荷脱落。 诸如此处呈现的负载脱落方案将可用资源分配给多个数据流以最大化分类决定的质量。 此外,这种负载脱落方案能够学习和适应数据流中不断变化的数据特性。

    Integrity assurance of query result from database service provider
    8.
    发明授权
    Integrity assurance of query result from database service provider 有权
    数据库服务提供商的查询结果的完整性保证

    公开(公告)号:US07870398B2

    公开(公告)日:2011-01-11

    申请号:US11626847

    申请日:2007-01-25

    IPC分类号: G06F12/14 G06F7/00

    摘要: A method, system and computer program product for confirming the validity of data returned from a data store. A data store contains a primary data set encrypted using a first encryption and a secondary data set using a second encryption. The secondary data set is a subset of the primary data set. A client issues a substantive query against the data store to retrieve a primary data result belonging to the primary data set. A query interface issues at least one validating query against the data store. Each validating query returns a secondary data result belonging to the secondary data set. The query interface receives the secondary data result and provides a data invalid notification if data satisfying the substantive query included in an unencrypted form of the secondary data result is not contained in an unencrypted form of the primary data result.

    摘要翻译: 一种用于确认从数据存储返回的数据的有效性的方法,系统和计算机程序产品。 数据存储包含使用第一加密加密的主数据集和使用第二加密的辅数据集。 辅助数据集是主数据集的子集。 客户端对数据存储器发出实质性查询以检索属于主数据集的主数据结果。 查询界面对数据存储区发出至少一个验证查询。 每个验证查询返回属于辅助数据集的辅助数据结果。 如果满足辅助数据结果的未加密形式的实质性查询的数据未包含在主数据结果的未加密形式中,则查询接口接收辅助数据结果并提供数据无效通知。

    Systems and methods for sequential modeling in less than one sequential scan
    9.
    发明授权
    Systems and methods for sequential modeling in less than one sequential scan 失效
    在不到一次顺序扫描中进行顺序建模的系统和方法

    公开(公告)号:US07822730B2

    公开(公告)日:2010-10-26

    申请号:US11931129

    申请日:2007-10-31

    IPC分类号: G06F17/30

    CPC分类号: G06N99/005 Y10S707/99931

    摘要: Most recent research of scalable inductive learning on very large streaming dataset focuses on eliminating memory constraints and reducing the number of sequential data scans. However, state-of-the-art algorithms still require multiple scans over the data set and use sophisticated control mechanisms and data structures. There is discussed herein a general inductive learning framework that scans the dataset exactly once. Then, there is proposed an extension based on Hoeffding's inequality that scans the dataset less than once. The proposed frameworks are applicable to a wide range of inductive learners.

    摘要翻译: 对最大流式数据集的可伸缩归纳学习的最新研究着重于消除记忆限制并减少顺序数据扫描的次数。 然而,最先进的算法仍然需要对数据集进行多次扫描,并使用复杂的控制机制和数据结构。 这里讨论了一般的归纳学习框架,该框架一次扫描数据集。 然后,提出了一种基于Hoeffding不等式的扩展,可以扫描数据集不止一次。 提出的框架适用于广泛的归纳学习者。

    SYSTEM AND METHOD FOR LOAD SHEDDING IN DATA MINING AND KNOWLEDGE DISCOVERY FROM STREAM DATA
    10.
    发明申请
    SYSTEM AND METHOD FOR LOAD SHEDDING IN DATA MINING AND KNOWLEDGE DISCOVERY FROM STREAM DATA 有权
    用于数据挖掘中的负载分解和来自流数据的知识发现的系统和方法

    公开(公告)号:US20090187914A1

    公开(公告)日:2009-07-23

    申请号:US12372568

    申请日:2009-02-17

    IPC分类号: G06F9/46 G06N5/02

    CPC分类号: G06K9/6297 H04L43/028

    摘要: Load shedding schemes for mining data streams. A scoring function is used to rank the importance of stream elements, and those elements with high importance are investigated. In the context of not knowing the exact feature values of a data stream, the use of a Markov model is proposed herein for predicting the feature distribution of a data stream. Based on the predicted feature distribution, one can make classification decisions to maximize the expected benefits. In addition, there is proposed herein the employment of a quality of decision (QoD) metric to measure the level of uncertainty in decisions and to guide load shedding. A load shedding scheme such as presented herein assigns available resources to multiple data streams to maximize the quality of classification decisions. Furthermore, such a load shedding scheme is able to learn and adapt to changing data characteristics in the data streams.

    摘要翻译: 挖掘数据流的加载脱落方案。 使用评分函数对流元素的重要性进行排序,并调查那些具有重要意义的元素。 在不知道数据流的精确特征值的上下文中,本文提出了使用马尔可夫模型来预测数据流的特征分布。 基于预测的特征分布,可以进行分类决定,以最大限度地提高预期效益。 此外,在此提出采用质量决策(QoD)度量来衡量决策中的不确定性水平并指导负荷脱落。 诸如此处呈现的负载脱落方案将可用资源分配给多个数据流以最大化分类决定的质量。 此外,这种负载脱落方案能够学习和适应数据流中不断变化的数据特性。