SYSTEM AND METHOD FOR FEATURE BASED LOAD SHEDDING IN CLASSIFICATION
    41.
    发明申请
    SYSTEM AND METHOD FOR FEATURE BASED LOAD SHEDDING IN CLASSIFICATION 审中-公开
    基于特征的负载分类的系统和方法

    公开(公告)号:US20080133438A1

    公开(公告)日:2008-06-05

    申请号:US11564885

    申请日:2006-11-30

    IPC分类号: G06N5/00

    CPC分类号: G06N20/00

    摘要: A system and method for feature based load shedding in classification. The system includes a plurality of data sources. The plurality of data sources being configured to render independent streams of input data, such data being selectively grouped together to form a particular classification task. The system further includes a central classification server configured to analyze and execute multiple tasks, each task consisting of multiple input data. The central classification server further configured to analyze the data for knowledge-based decision-making. The central classification server being communicatively engaged via a network to the plurality of data sources. The method includes rendering independent streams of input data, such data being selectively grouped together to form a particular task. The method further includes analyzing and handling multiple tasks, each task consisting of multiple input data. The method also includes analyzing the data for knowledge-based decision-making.

    摘要翻译: 一种分类中基于特征的负载脱落的系统和方法。 该系统包括多个数据源。 多个数据源被配置为呈现独立的输入数据流,这样的数据被选择性地分组在一起以形成特定的分类任务。 该系统还包括配置成分析和执行多个任务的中央分类服务器,每个任务由多个输入数据组成。 中央分类服务器还被配置为分析用于基于知识的决策的数据。 中央分类服务器经由网络被通信地接合到多个数据源。 该方法包括呈现独立的输入数据流,这样的数据被选择性地分组在一起以形成特定的任务。 该方法还包括分析和处理多个任务,每个任务由多个输入数据组成。 该方法还包括分析基于知识的决策的数据。

    System and method for learning models from scarce and skewed training data
    42.
    发明申请
    System and method for learning models from scarce and skewed training data 失效
    从稀缺和倾斜的训练数据中学习模型的系统和方法

    公开(公告)号:US20080071721A1

    公开(公告)日:2008-03-20

    申请号:US11506226

    申请日:2006-08-18

    IPC分类号: G06N5/02

    CPC分类号: G06N99/005

    摘要: A system and method for learning models from scarce and/or skewed training data includes partitioning a data stream into a sequence of time windows. A most likely current class distribution to classify portions of the data stream is determined based on observing training data in a current time window and based on concept drift probability patterns using historical information.

    摘要翻译: 用于从稀缺和/或倾斜的训练数据学习模型的系统和方法包括将数据流划分成时间窗口序列。 基于在当前时间窗口中观察训练数据并且基于使用历史信息的概念漂移概率模式来确定对数据流的部分进行分类的最可能的当前类别分布。

    Space and time efficient XML graph labeling
    43.
    发明申请
    Space and time efficient XML graph labeling 失效
    空间和时间有效的XML图形标注

    公开(公告)号:US20070230488A1

    公开(公告)日:2007-10-04

    申请号:US11396502

    申请日:2006-03-31

    IPC分类号: H04L12/56

    CPC分类号: H04L45/48 H04L45/02

    摘要: There is provided a method for determining reachability between any two nodes within a graph. The inventive method utilizes a dual-labeling scheme. Initially, a spanning tree is defined for a group of nodes within a graph. Each node in the spanning tree is assigned a unique interval-based label, that describes its dependency from an ancestor node. Non-tree labels are then assigned to each node in the spanning tree that is connected to another node in the spanning tree by a non-tree link. From these labels, reachability of any two nodes in the spanning tree is determined by using only the interval-based labels and the non-tree labels.

    摘要翻译: 提供了一种用于确定图中任何两个节点之间的可达性的方法。 本发明的方法利用双标记方案。 最初,为图中的一组节点定义了生成树。 生成树中的每个节点都被分配一个唯一的基于间隔的标签,它描述了从祖先节点的依赖关系。 然后,非树标签被分配给生成树中通过非树形链接连接到生成树中的另一个节点的每个节点。 从这些标签中,生成树中任何两个节点的可达性通过仅使用基于间隔的标签和非树标签来确定。

    System and method for load shedding in data mining and knowledge discovery from stream data

    公开(公告)号:US20060184527A1

    公开(公告)日:2006-08-17

    申请号:US11058944

    申请日:2005-02-16

    IPC分类号: H04L27/28

    CPC分类号: G06K9/6297 H04L43/028

    摘要: Load shedding schemes for mining data streams. A scoring function is used to rank the importance of stream elements, and those elements with high importance are investigated. In the context of not knowing the exact feature values of a data stream, the use of a Markov model is proposed herein for predicting the feature distribution of a data stream. Based on the predicted feature distribution, one can make classification decisions to maximize the expected benefits. In addition, there is proposed herein the employment of a quality of decision (QoD) metric to measure the level of uncertainty in decisions and to guide load shedding. A load shedding scheme such as presented herein assigns available resources to multiple data streams to maximize the quality of classification decisions. Furthermore, such a load shedding scheme is able to learn and adapt to changing data characteristics in the data streams.

    System and method for continuous diagnosis of data streams
    45.
    发明申请
    System and method for continuous diagnosis of data streams 失效
    用于连续诊断数据流的系统和方法

    公开(公告)号:US20060010093A1

    公开(公告)日:2006-01-12

    申请号:US10880913

    申请日:2004-06-30

    IPC分类号: G06F17/30

    摘要: In connection with the mining of time-evolving data streams, a general framework that mines changes and reconstructs models from a data stream with unlabeled instances or a limited number of labeled instances. In particular, there are defined herein statistical profiling methods that extend a classification tree in order to guess the percentage of drifts in the data stream without any labelled data. Exact error can be estimated by actively sampling a small number of true labels. If the estimated error is significantly higher than empirical expectations, there preferably re-sampled a small number of true labels to reconstruct the decision tree from the leaf node level.

    摘要翻译: 与挖掘时间不断变化的数据流有关的一般框架,即从具有未标记实例的数据流或有限数量的标记实例中挖掘变更和重建模型。 特别地,这里定义了扩展分类树的统计分析方法,以便在没有任何标记数据的情况下猜测数据流中漂移的百分比。 可以通过主动抽取少量真实标签来估计精确误差。 如果估计的误差明显高于经验期望值,则最好重新采样少量的真实标签,以从叶节点级别重建决策树。

    Semantic query by example
    46.
    发明授权

    公开(公告)号:US10176245B2

    公开(公告)日:2019-01-08

    申请号:US12566882

    申请日:2009-09-25

    IPC分类号: G06F17/00 G06F17/30

    摘要: A computer-implemented method, system, and computer program product for producing a semantic query by example are provided. The method includes receiving examples of potential results from querying a database table with an associated ontology, and extracting features from the database table and the examples based on the associated ontology. The method further includes training a classifier based on the examples and the extracted features, and applying the classifier to the database table to obtain a semantic query result. The method also includes outputting the semantic query result to a user interface, and requesting user feedback of satisfaction with the semantic query result. The method additionally includes updating the classifier and the semantic query result iteratively in response to the user feedback.

    Identifying and annotating shared hierarchical markup document trees
    47.
    发明授权
    Identifying and annotating shared hierarchical markup document trees 失效
    识别和注释共享分层标记文档树

    公开(公告)号:US08108765B2

    公开(公告)日:2012-01-31

    申请号:US11548325

    申请日:2006-10-11

    IPC分类号: G06F17/00

    CPC分类号: G06F17/30923

    摘要: Disclosed are a method, information processing system, and a computer readable medium for managing documents. The method includes analyzing a plurality of hierarchical markup documents, wherein each hierarchical markup document is representable by a hierarchical tree structure. A shared hierarchical markup document associated with the plurality of hierarchical markup documents is generated based on the analyzing. Each hierarchical markup document in the plurality of hierarchical markup documents is compared with the shared hierarchical document. A plurality of difference hierarchical markup documents is generated based on the comparing.

    摘要翻译: 公开了一种用于管理文件的方法,信息处理系统和计算机可读介质。 该方法包括分析多个分层标记文档,其中每个分层标记文档可由分层树结构表示。 基于分析生成与多个分层标记文档相关联的共享分层标记文档。 将多个分层标记文档中的每个分层标记文档与共享分层文档进行比较。 基于比较生成多个不同的分层标记文档。

    Systems and methods for sequential modeling in less than one sequential scan
    48.
    发明授权
    Systems and methods for sequential modeling in less than one sequential scan 失效
    在不到一次顺序扫描中进行顺序建模的系统和方法

    公开(公告)号:US07822730B2

    公开(公告)日:2010-10-26

    申请号:US11931129

    申请日:2007-10-31

    IPC分类号: G06F17/30

    CPC分类号: G06N99/005 Y10S707/99931

    摘要: Most recent research of scalable inductive learning on very large streaming dataset focuses on eliminating memory constraints and reducing the number of sequential data scans. However, state-of-the-art algorithms still require multiple scans over the data set and use sophisticated control mechanisms and data structures. There is discussed herein a general inductive learning framework that scans the dataset exactly once. Then, there is proposed an extension based on Hoeffding's inequality that scans the dataset less than once. The proposed frameworks are applicable to a wide range of inductive learners.

    摘要翻译: 对最大流式数据集的可伸缩归纳学习的最新研究着重于消除记忆限制并减少顺序数据扫描的次数。 然而,最先进的算法仍然需要对数据集进行多次扫描,并使用复杂的控制机制和数据结构。 这里讨论了一般的归纳学习框架,该框架一次扫描数据集。 然后,提出了一种基于Hoeffding不等式的扩展,可以扫描数据集不止一次。 提出的框架适用于广泛的归纳学习者。

    System and method for learning models from scarce and skewed training data
    49.
    发明授权
    System and method for learning models from scarce and skewed training data 失效
    从稀缺和倾斜的训练数据中学习模型的系统和方法

    公开(公告)号:US07630950B2

    公开(公告)日:2009-12-08

    申请号:US11506226

    申请日:2006-08-18

    IPC分类号: G06F17/00 G06N5/02

    CPC分类号: G06N99/005

    摘要: A system and method for learning models from scarce and/or skewed training data includes partitioning a data stream into a sequence of time windows. A most likely current class distribution to classify portions of the data stream is determined based on observing training data in a current time window and based on concept drift probability patterns using historical information.

    摘要翻译: 用于从稀缺和/或倾斜的训练数据学习模型的系统和方法包括将数据流划分成时间窗口序列。 基于在当前时间窗口中观察训练数据并且基于使用历史信息的概念漂移概率模式来确定对数据流的部分进行分类的最可能的当前类别分布。

    System and Method for Scalable Processing of Multi-Way Data Stream Correlations
    50.
    发明申请
    System and Method for Scalable Processing of Multi-Way Data Stream Correlations 失效
    用于多路数据流相关性的可扩展处理的系统和方法

    公开(公告)号:US20090248749A1

    公开(公告)日:2009-10-01

    申请号:US12478627

    申请日:2009-06-04

    IPC分类号: G06F17/30 G06F15/16

    摘要: A computer implemented method, apparatus, and computer usable program code for processing multi-way stream correlations. Stream data are received for correlation. A task is formed for continuously partitioning a multi-way stream correlation workload into smaller workload pieces. Each of the smaller workload pieces may be processed by a single host. The stream data are sent to different hosts for correlation processing.

    摘要翻译: 一种用于处理多路流相关性的计算机实现的方法,装置和计算机可用程序代码。 接收流数据进行相关。 形成一个任务,用于将多路流相关工作负载连续划分成较小的工作负载。 每个较小的工作负载片段可以由单个主机处理。 流数据被发送到不同的主机进行相关处理。