Identifying and annotating shared hierarchical markup document trees
    11.
    发明授权
    Identifying and annotating shared hierarchical markup document trees 失效
    识别和注释共享分层标记文档树

    公开(公告)号:US08108765B2

    公开(公告)日:2012-01-31

    申请号:US11548325

    申请日:2006-10-11

    IPC分类号: G06F17/00

    CPC分类号: G06F17/30923

    摘要: Disclosed are a method, information processing system, and a computer readable medium for managing documents. The method includes analyzing a plurality of hierarchical markup documents, wherein each hierarchical markup document is representable by a hierarchical tree structure. A shared hierarchical markup document associated with the plurality of hierarchical markup documents is generated based on the analyzing. Each hierarchical markup document in the plurality of hierarchical markup documents is compared with the shared hierarchical document. A plurality of difference hierarchical markup documents is generated based on the comparing.

    摘要翻译: 公开了一种用于管理文件的方法,信息处理系统和计算机可读介质。 该方法包括分析多个分层标记文档,其中每个分层标记文档可由分层树结构表示。 基于分析生成与多个分层标记文档相关联的共享分层标记文档。 将多个分层标记文档中的每个分层标记文档与共享分层文档进行比较。 基于比较生成多个不同的分层标记文档。

    QUERYING DATA AND AN ASSOCIATED ONTOLOGY IN A DATABASE MANAGEMENT SYSTEM
    12.
    发明申请
    QUERYING DATA AND AN ASSOCIATED ONTOLOGY IN A DATABASE MANAGEMENT SYSTEM 审中-公开
    在数据库管理系统中查询数据和相关的本体

    公开(公告)号:US20080172360A1

    公开(公告)日:2008-07-17

    申请号:US11623952

    申请日:2007-01-17

    IPC分类号: G06F7/06

    摘要: A method, apparatus, and computer program product for querying data in a database. An ontology is associated with the data in the database. A query containing a query predicate is received. The query predicate is expanded using implications from the ontology to form a modified query. The modified query is rewritten to include subsumption checking.

    摘要翻译: 一种用于查询数据库中的数据的方法,装置和计算机程序产品。 本体与数据库中的数据相关联。 收到包含查询谓词的查询。 使用本体的含义扩展查询谓词以形成修改的查询。 被修改的查询被重写以包括包含检查。

    SEMANTIC QUERY BY EXAMPLE
    13.
    发明申请
    SEMANTIC QUERY BY EXAMPLE 审中-公开
    示例的语义查询

    公开(公告)号:US20110078187A1

    公开(公告)日:2011-03-31

    申请号:US12566882

    申请日:2009-09-25

    IPC分类号: G06F17/30 G06F17/27

    摘要: A computer-implemented method, system, and computer program product for producing a semantic query by example are provided. The method includes receiving examples of potential results from querying a database table with an associated ontology, and extracting features from the database table and the examples based on the associated ontology. The method further includes training a classifier based on the examples and the extracted features, and applying the classifier to the database table to obtain a semantic query result. The method also includes outputting the semantic query result to a user interface, and requesting user feedback of satisfaction with the semantic query result. The method additionally includes updating the classifier and the semantic query result iteratively in response to the user feedback.

    摘要翻译: 提供了一种用于通过示例产生语义查询的计算机实现的方法,系统和计算机程序产品。 该方法包括从相关联的本体查询数据库表并从数据库表中提取特征以及基于相关本体的示例来接收潜在结果的示例。 该方法还包括基于示例和提取的特征来训练分类器,并将分类器应用于数据库表以获得语义查询结果。 该方法还包括将语义查询结果输出到用户界面,并且请求用户对语义查询结果满意的反馈。 该方法还包括响应于用户反馈迭代地更新分类器和语义查询结果。

    IDENTIFYING AND ANNOTATING SHARED HIERARCHICAL MARKUP DOCUMENT TREES
    14.
    发明申请
    IDENTIFYING AND ANNOTATING SHARED HIERARCHICAL MARKUP DOCUMENT TREES 失效
    识别和分析共享的分层标记文件

    公开(公告)号:US20080092034A1

    公开(公告)日:2008-04-17

    申请号:US11548325

    申请日:2006-10-11

    IPC分类号: G06F17/00 G06F17/30 G06F7/00

    CPC分类号: G06F17/30923

    摘要: Disclosed are a method, information processing system, and a computer readable medium for managing documents. The method includes analyzing a plurality of hierarchical markup documents, wherein each hierarchical markup document is representable by a hierarchical tree structure. A shared hierarchical markup document associated with the plurality of hierarchical markup documents is generated based on the analyzing. Each hierarchical markup document in the plurality of hierarchical markup documents is compared with the shared hierarchical document. A plurality of difference hierarchical markup documents is generated based on the comparing.

    摘要翻译: 公开了一种用于管理文件的方法,信息处理系统和计算机可读介质。 该方法包括分析多个分层标记文档,其中每个分层标记文档可由分层树结构表示。 基于分析生成与多个分层标记文档相关联的共享分层标记文档。 将多个分层标记文档中的每个分层标记文档与共享分层文档进行比较。 基于比较生成多个不同的分层标记文档。

    PROCESSING QUERIES ON HIERARCHICAL MARKUP DATA USING SHARED HIERARCHICAL MARKUP TREES
    15.
    发明申请
    PROCESSING QUERIES ON HIERARCHICAL MARKUP DATA USING SHARED HIERARCHICAL MARKUP TREES 失效
    使用共享的分层标记处理关于分层标记数据的查询

    公开(公告)号:US20080091649A1

    公开(公告)日:2008-04-17

    申请号:US11548321

    申请日:2006-10-11

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30929

    摘要: Disclosed are a method, information processing system, and computer readable medium for processing queries. The method includes receiving a data query for a set of hierarchical markup documents. At least one query path expression is extracted from the data query. The query path is processed against at least one shared hierarchical markup document in a plurality of shared hierarchical markup documents. The plurality of shared hierarchical documents is associated with the set of hierarchical markup documents. In response to the shared hierarchical markup document completely matching the query path expression, a query result for the data query is generated. The query result is based on the processing of the query path expression against at least one of the shared hierarchical markup document and the difference hierarchical markup document.

    摘要翻译: 公开了一种用于处理查询的方法,信息处理系统和计算机可读介质。 该方法包括接收一组分层标记文档的数据查询。 从数据查询中提取至少一个查询路径表达式。 针对多个共享分层标记文档中的至少一个共享分层标记文档处理查询路径。 多个共享分层文档与分层标记文档集合相关联。 响应于完全匹配查询路径表达式的共享分层标记文档,生成数据查询的查询结果。 查询结果基于对于共享分层标记文档和差异分层标记文档中的至少一个的查询路径表达的处理。

    System and method for load shedding in data mining and knowledge discovery from stream data
    16.
    发明授权
    System and method for load shedding in data mining and knowledge discovery from stream data 有权
    数据挖掘中的负载脱落和流数据的知识发现的系统和方法

    公开(公告)号:US08060461B2

    公开(公告)日:2011-11-15

    申请号:US12372568

    申请日:2009-02-17

    IPC分类号: G06F7/00 G06F17/00

    CPC分类号: G06K9/6297 H04L43/028

    摘要: Load shedding schemes for mining data streams. A scoring function is used to rank the importance of stream elements, and those elements with high importance are investigated. In the context of not knowing the exact feature values of a data stream, the use of a Markov model is proposed herein for predicting the feature distribution of a data stream. Based on the predicted feature distribution, one can make classification decisions to maximize the expected benefits. In addition, there is proposed herein the employment of a quality of decision (QoD) metric to measure the level of uncertainty in decisions and to guide load shedding. A load shedding scheme such as presented herein assigns available resources to multiple data streams to maximize the quality of classification decisions. Furthermore, such a load shedding scheme is able to learn and adapt to changing data characteristics in the data streams.

    摘要翻译: 挖掘数据流的加载脱落方案。 使用评分函数对流元素的重要性进行排序,并调查那些具有重要意义的元素。 在不知道数据流的精确特征值的上下文中,本文提出了使用马尔可夫模型来预测数据流的特征分布。 基于预测的特征分布,可以进行分类决定,以最大限度地提高预期效益。 此外,在此提出采用质量决策(QoD)度量来衡量决策中的不确定性水平并指导负荷脱落。 诸如此处呈现的负载脱落方案将可用资源分配给多个数据流以最大化分类决定的质量。 此外,这种负载脱落方案能够学习和适应数据流中不断变化的数据特性。

    System and method for scalable cost-sensitive learning
    17.
    发明授权
    System and method for scalable cost-sensitive learning 有权
    可扩展成本敏感学习的系统和方法

    公开(公告)号:US07904397B2

    公开(公告)日:2011-03-08

    申请号:US12690502

    申请日:2010-01-20

    IPC分类号: G06F15/18 G06N3/00 G06N3/12

    CPC分类号: G06N99/005

    摘要: A method (and structure) for processing an inductive learning model for a dataset of examples, includes dividing the dataset of examples into a plurality of subsets of data and generating, using a processor on a computer, a learning model using examples of a first subset of data of the plurality of subsets of data. The learning model being generated for the first subset comprises an initial stage of an evolving aggregate learning model (ensemble model) for an entirety of the dataset, the ensemble model thereby providing an evolving estimated learning model for the entirety of the dataset if all the subsets were to be processed. The generating of the learning model using data from a subset includes calculating a value for at least one parameter that provides an objective indication of an adequacy of a current stage of the ensemble model.

    摘要翻译: 一种用于处理实例的数据集的感应学习模型的方法(和结构),包括将示例的数据集划分成多个数据子集,并使用计算机上的处理器生成使用第一子集的示例的学习模型 的多个数据子集的数据。 为第一子集生成的学习模型包括用于整个数据集的演进聚合学习模型(集合模型)的初始阶段,从而为整个数据集提供演进的估计学习模型,如果所有子集 被处理。 使用来自子集的数据生成学习模型包括计算至少一个参数的值,所述参数提供对所述集合模型的当前阶段的充分性的客观指示。

    System and method for classifying data streams using high-order models
    18.
    发明授权
    System and method for classifying data streams using high-order models 有权
    使用高阶模型对数据流进行分类的系统和方法

    公开(公告)号:US07724784B2

    公开(公告)日:2010-05-25

    申请号:US11520529

    申请日:2006-09-13

    IPC分类号: H04J3/04

    CPC分类号: H04L65/601 H04L65/607

    摘要: A computer implemented method, system, and computer usable program code for classifying a data stream using high-order models. The data stream is divided into a plurality of data segments. A classifier is selected for each of the plurality of data segments. Each of a plurality of classifiers is clustered into states. A state transition matrix is computed for the states. The states of the state transition matrix specify one of the high-order models for classifying the data stream.

    摘要翻译: 计算机实现的方法,系统和计算机可用程序代码,用于使用高阶模型对数据流进行分类。 数据流被分成多个数据段。 为多个数据段中的每一个选择分类器。 多个分类器中的每一个被聚类成状态。 为状态计算状态转换矩阵。 状态转换矩阵的状态指定用于对数据流进行分类的高阶模型之一。

    System and method for ranked keyword search on graphs
    19.
    发明授权
    System and method for ranked keyword search on graphs 有权
    在图表上排名关键词搜索的系统和方法

    公开(公告)号:US07702620B2

    公开(公告)日:2010-04-20

    申请号:US11693471

    申请日:2007-03-29

    IPC分类号: G06F17/30

    摘要: Arrangements and methods for providing for the efficient implementation of ranked keyword searches on graph-structured data. Since it is difficult to directly build indexes for general schemaless graphs, conventional techniques highly rely on graph traversal in running time. The previous lack of more knowledge about graphs also resulted in great difficulties in applying pruning techniques. To address these problems, there is introduced herein a new scoring function while the block is used as an intermediate access level; the result is an opportunity to create sophisticated indexes for keyword search. Also proposed herein is a cost-balanced expansion algorithm to conduct a backward search, which provides a good theoretical guarantee in terms of the search cost.

    摘要翻译: 用于提供在图形结构化数据上有效执行排名关键词搜索的安排和方法。 由于难以直接构建一般无法图的索引,常规技术高度依赖于运行时间的图遍历。 以前缺乏对图形的更多了解也导致了应用修剪技术的巨大困难。 为了解决这些问题,这里引入了一个新的评分功能,而块被用作中间访问级别; 结果是为关键字搜索创建复杂索引的机会。 这里还提出了一种用于进行后向搜索的成本平衡的扩展算法,这在搜索成本方面提供了良好的理论保证。

    System and method for sequence-based subspace pattern clustering
    20.
    发明授权
    System and method for sequence-based subspace pattern clustering 失效
    基于序列的子空间模式聚类的系统和方法

    公开(公告)号:US07565346B2

    公开(公告)日:2009-07-21

    申请号:US10858541

    申请日:2004-05-31

    IPC分类号: G06F17/30

    CPC分类号: G06K9/6215 Y10S707/99936

    摘要: Unlike traditional clustering methods that focus on grouping objects with similar values on a set of dimensions, clustering by pattern similarity finds objects that exhibit a coherent pattern of rise and fall in subspaces. Pattern-based clustering extends the concept of traditional clustering and benefits a wide range of applications, including e-Commerce target marketing, bioinformatics (large scale scientific data analysis), and automatic computing (web usage analysis), etc. However, state-of-the-art pattern-based clustering methods (e.g., the pCluster algorithm) can only handle datasets of thousands of records, which makes them inappropriate for many real-life applications. Furthermore, besides the huge data volume, many data sets are also characterized by their sequentiality, for instance, customer purchase records and network event logs are usually modeled as data sequences. Hence, it becomes important to enable pattern-based clustering methods i) to handle large datasets, and ii) to discover pattern similarity embedded in data sequences. There is presented herein a novel method that offers this capability.

    摘要翻译: 与传统的集群方法不同,传统的集群方法集中在对一组维度上具有类似值的对象进行分组,通过模式相似性进行聚类可以找到在子空间中呈现一致的上升和下降模式的对象。 基于模式的群集扩展了传统群集的概念,受益于广泛的应用,包括电子商务目标营销,生物信息学(大规模科学数据分析)和自动计算(Web使用分析)等。然而,状态 基于图案的聚类方法(例如,pCluster算法)只能处理数千条记录的数据集,这使得它们不适合许多现实生活中的应用。 此外,除了巨大的数据量之外,许多数据集的特征还在于它们的顺序性,例如,客户购买记录和网络事件日志通常被建模为数据序列。 因此,重要的是启用基于图案的聚类方法i)处理大数据集,以及ii)发现嵌入在数据序列中的模式相似性。 这里提供了一种提供这种能力的新颖方法。