SYSTEM AND METHOD FOR FEATURE BASED LOAD SHEDDING IN CLASSIFICATION
    51.
    发明申请
    SYSTEM AND METHOD FOR FEATURE BASED LOAD SHEDDING IN CLASSIFICATION 审中-公开
    基于特征的负载分类的系统和方法

    公开(公告)号:US20080133438A1

    公开(公告)日:2008-06-05

    申请号:US11564885

    申请日:2006-11-30

    IPC分类号: G06N5/00

    CPC分类号: G06N20/00

    摘要: A system and method for feature based load shedding in classification. The system includes a plurality of data sources. The plurality of data sources being configured to render independent streams of input data, such data being selectively grouped together to form a particular classification task. The system further includes a central classification server configured to analyze and execute multiple tasks, each task consisting of multiple input data. The central classification server further configured to analyze the data for knowledge-based decision-making. The central classification server being communicatively engaged via a network to the plurality of data sources. The method includes rendering independent streams of input data, such data being selectively grouped together to form a particular task. The method further includes analyzing and handling multiple tasks, each task consisting of multiple input data. The method also includes analyzing the data for knowledge-based decision-making.

    摘要翻译: 一种分类中基于特征的负载脱落的系统和方法。 该系统包括多个数据源。 多个数据源被配置为呈现独立的输入数据流,这样的数据被选择性地分组在一起以形成特定的分类任务。 该系统还包括配置成分析和执行多个任务的中央分类服务器,每个任务由多个输入数据组成。 中央分类服务器还被配置为分析用于基于知识的决策的数据。 中央分类服务器经由网络被通信地接合到多个数据源。 该方法包括呈现独立的输入数据流,这样的数据被选择性地分组在一起以形成特定的任务。 该方法还包括分析和处理多个任务,每个任务由多个输入数据组成。 该方法还包括分析基于知识的决策的数据。

    Space and time efficient XML graph labeling
    52.
    发明申请
    Space and time efficient XML graph labeling 失效
    空间和时间有效的XML图形标注

    公开(公告)号:US20070230488A1

    公开(公告)日:2007-10-04

    申请号:US11396502

    申请日:2006-03-31

    IPC分类号: H04L12/56

    CPC分类号: H04L45/48 H04L45/02

    摘要: There is provided a method for determining reachability between any two nodes within a graph. The inventive method utilizes a dual-labeling scheme. Initially, a spanning tree is defined for a group of nodes within a graph. Each node in the spanning tree is assigned a unique interval-based label, that describes its dependency from an ancestor node. Non-tree labels are then assigned to each node in the spanning tree that is connected to another node in the spanning tree by a non-tree link. From these labels, reachability of any two nodes in the spanning tree is determined by using only the interval-based labels and the non-tree labels.

    摘要翻译: 提供了一种用于确定图中任何两个节点之间的可达性的方法。 本发明的方法利用双标记方案。 最初,为图中的一组节点定义了生成树。 生成树中的每个节点都被分配一个唯一的基于间隔的标签,它描述了从祖先节点的依赖关系。 然后,非树标签被分配给生成树中通过非树形链接连接到生成树中的另一个节点的每个节点。 从这些标签中,生成树中任何两个节点的可达性通过仅使用基于间隔的标签和非树标签来确定。

    System and method for load shedding in data mining and knowledge discovery from stream data

    公开(公告)号:US20060184527A1

    公开(公告)日:2006-08-17

    申请号:US11058944

    申请日:2005-02-16

    IPC分类号: H04L27/28

    CPC分类号: G06K9/6297 H04L43/028

    摘要: Load shedding schemes for mining data streams. A scoring function is used to rank the importance of stream elements, and those elements with high importance are investigated. In the context of not knowing the exact feature values of a data stream, the use of a Markov model is proposed herein for predicting the feature distribution of a data stream. Based on the predicted feature distribution, one can make classification decisions to maximize the expected benefits. In addition, there is proposed herein the employment of a quality of decision (QoD) metric to measure the level of uncertainty in decisions and to guide load shedding. A load shedding scheme such as presented herein assigns available resources to multiple data streams to maximize the quality of classification decisions. Furthermore, such a load shedding scheme is able to learn and adapt to changing data characteristics in the data streams.

    System and method for continuous diagnosis of data streams
    54.
    发明申请
    System and method for continuous diagnosis of data streams 失效
    用于连续诊断数据流的系统和方法

    公开(公告)号:US20060010093A1

    公开(公告)日:2006-01-12

    申请号:US10880913

    申请日:2004-06-30

    IPC分类号: G06F17/30

    摘要: In connection with the mining of time-evolving data streams, a general framework that mines changes and reconstructs models from a data stream with unlabeled instances or a limited number of labeled instances. In particular, there are defined herein statistical profiling methods that extend a classification tree in order to guess the percentage of drifts in the data stream without any labelled data. Exact error can be estimated by actively sampling a small number of true labels. If the estimated error is significantly higher than empirical expectations, there preferably re-sampled a small number of true labels to reconstruct the decision tree from the leaf node level.

    摘要翻译: 与挖掘时间不断变化的数据流有关的一般框架,即从具有未标记实例的数据流或有限数量的标记实例中挖掘变更和重建模型。 特别地,这里定义了扩展分类树的统计分析方法,以便在没有任何标记数据的情况下猜测数据流中漂移的百分比。 可以通过主动抽取少量真实标签来估计精确误差。 如果估计的误差明显高于经验期望值,则最好重新采样少量的真实标签,以从叶节点级别重建决策树。

    Semantic query by example
    55.
    发明授权

    公开(公告)号:US10176245B2

    公开(公告)日:2019-01-08

    申请号:US12566882

    申请日:2009-09-25

    IPC分类号: G06F17/00 G06F17/30

    摘要: A computer-implemented method, system, and computer program product for producing a semantic query by example are provided. The method includes receiving examples of potential results from querying a database table with an associated ontology, and extracting features from the database table and the examples based on the associated ontology. The method further includes training a classifier based on the examples and the extracted features, and applying the classifier to the database table to obtain a semantic query result. The method also includes outputting the semantic query result to a user interface, and requesting user feedback of satisfaction with the semantic query result. The method additionally includes updating the classifier and the semantic query result iteratively in response to the user feedback.

    Identifying and annotating shared hierarchical markup document trees
    56.
    发明授权
    Identifying and annotating shared hierarchical markup document trees 失效
    识别和注释共享分层标记文档树

    公开(公告)号:US08108765B2

    公开(公告)日:2012-01-31

    申请号:US11548325

    申请日:2006-10-11

    IPC分类号: G06F17/00

    CPC分类号: G06F17/30923

    摘要: Disclosed are a method, information processing system, and a computer readable medium for managing documents. The method includes analyzing a plurality of hierarchical markup documents, wherein each hierarchical markup document is representable by a hierarchical tree structure. A shared hierarchical markup document associated with the plurality of hierarchical markup documents is generated based on the analyzing. Each hierarchical markup document in the plurality of hierarchical markup documents is compared with the shared hierarchical document. A plurality of difference hierarchical markup documents is generated based on the comparing.

    摘要翻译: 公开了一种用于管理文件的方法,信息处理系统和计算机可读介质。 该方法包括分析多个分层标记文档,其中每个分层标记文档可由分层树结构表示。 基于分析生成与多个分层标记文档相关联的共享分层标记文档。 将多个分层标记文档中的每个分层标记文档与共享分层文档进行比较。 基于比较生成多个不同的分层标记文档。

    System and Method for Scalable Processing of Multi-Way Data Stream Correlations
    57.
    发明申请
    System and Method for Scalable Processing of Multi-Way Data Stream Correlations 失效
    用于多路数据流相关性的可扩展处理的系统和方法

    公开(公告)号:US20090248749A1

    公开(公告)日:2009-10-01

    申请号:US12478627

    申请日:2009-06-04

    IPC分类号: G06F17/30 G06F15/16

    摘要: A computer implemented method, apparatus, and computer usable program code for processing multi-way stream correlations. Stream data are received for correlation. A task is formed for continuously partitioning a multi-way stream correlation workload into smaller workload pieces. Each of the smaller workload pieces may be processed by a single host. The stream data are sent to different hosts for correlation processing.

    摘要翻译: 一种用于处理多路流相关性的计算机实现的方法,装置和计算机可用程序代码。 接收流数据进行相关。 形成一个任务,用于将多路流相关工作负载连续划分成较小的工作负载。 每个较小的工作负载片段可以由单个主机处理。 流数据被发送到不同的主机进行相关处理。

    METHOD AND SYSTEM FOR COMBINING RANKING AND CLUSTERING IN A DATABASE MANAGEMENT SYSTEM
    58.
    发明申请
    METHOD AND SYSTEM FOR COMBINING RANKING AND CLUSTERING IN A DATABASE MANAGEMENT SYSTEM 审中-公开
    在数据库管理系统中组合排序和聚类的方法和系统

    公开(公告)号:US20080270374A1

    公开(公告)日:2008-10-30

    申请号:US11740090

    申请日:2007-04-25

    IPC分类号: G06F17/30

    CPC分类号: G06F16/24558

    摘要: A system for combining ranking and clustering in a query. Bit vectors are intersected on Boolean attributes resulting in a vector. Two summary grids are constructed by intersecting bit vectors on clustering and ranking attributes. The vector is intersected with each summary grid to obtain a filtered clustering and ranking grid. An algorithm is applied on the clustering grid to obtain clusters. Vectors associated with buckets in the clusters are intersected resulting in one vector for each cluster. The vector corresponding to each cluster is intersected with the ranking grid to obtain a modified grid. Buckets are pruned according to bounds of each bucket in the modified grid and a predetermined number to obtain candidate buckets containing the predetermined number of data. The data are retrieved and a ranking score is calculated. The top predetermined number of data are sorted according to ranking scores and a result is returned.

    摘要翻译: 用于在查询中组合排名和聚类的系统。 位向量在布尔属性上相交,导致一个向量。 通过在聚类和排名属性上相交位向量来构建两个概要网格。 向量与每个摘要网格相交,以获得过滤的聚类和排序网格。 在聚类网格上应用一种算法来获得聚类。 与群集中的桶相关联的向量相交,导致每个簇的一个向量。 对应于每个簇的向量与排名网格相交,以获得修改的网格。 根据修改的网格中的每个桶的边界和预定数量来修剪桶,以获得包含预定数量的数据的候选桶。 检索数据并计算排名分数。 最高预定数量的数据根据​​排名得分排序并返回结果。

    QUERYING DATA AND AN ASSOCIATED ONTOLOGY IN A DATABASE MANAGEMENT SYSTEM
    59.
    发明申请
    QUERYING DATA AND AN ASSOCIATED ONTOLOGY IN A DATABASE MANAGEMENT SYSTEM 审中-公开
    在数据库管理系统中查询数据和相关的本体

    公开(公告)号:US20080172360A1

    公开(公告)日:2008-07-17

    申请号:US11623952

    申请日:2007-01-17

    IPC分类号: G06F7/06

    摘要: A method, apparatus, and computer program product for querying data in a database. An ontology is associated with the data in the database. A query containing a query predicate is received. The query predicate is expanded using implications from the ontology to form a modified query. The modified query is rewritten to include subsumption checking.

    摘要翻译: 一种用于查询数据库中的数据的方法,装置和计算机程序产品。 本体与数据库中的数据相关联。 收到包含查询谓词的查询。 使用本体的含义扩展查询谓词以形成修改的查询。 被修改的查询被重写以包括包含检查。

    Near-neighbor search in pattern distance spaces
    60.
    发明申请
    Near-neighbor search in pattern distance spaces 审中-公开
    近距离搜索模式距离空间

    公开(公告)号:US20050114331A1

    公开(公告)日:2005-05-26

    申请号:US10722776

    申请日:2003-11-26

    申请人: Haixun Wang Philip Yu

    发明人: Haixun Wang Philip Yu

    IPC分类号: G06F7/00 G06F19/00 G06K9/62

    摘要: Similarity searching techniques are provided. In one aspect, a method for use in finding near-neighbors in a set of objects comprises the following steps. Subspace pattern similarities that the objects in the set exhibit in multi-dimensional spaces are identified. Subspace correlations are defined between two or more of the objects in the set based on the identified subspace pattern similarities for use in identifying near-neighbor objects. A pattern distance index may be created. A method of performing a near-neighbor search of one or more query objects against a set of objects is also provided.

    摘要翻译: 提供相似性搜索技术。 在一个方面,一种用于在一组对象中找到近邻的方法包括以下步骤。 确定集合中的对象在多维空间中显示的子空间模式相似性。 基于用于识别近邻物体的所识别的子空间模式相似度,在集合中的两个或更多个对象之间定义子空间相关性。 可以创建图案距离索引。 还提供了针对一组对象执行对一个或多个查询对象的近邻搜索的方法。