System and method for indexing weighted-sequences in large databases
    51.
    发明授权
    System and method for indexing weighted-sequences in large databases 有权
    用于索引大数据库中加权序列的系统和方法

    公开(公告)号:US09009176B2

    公开(公告)日:2015-04-14

    申请号:US12198717

    申请日:2008-08-26

    IPC分类号: G06F7/00 G06F17/30

    摘要: The present invention provides an index structure for managing weighted-sequences in large databases. A weighted-sequence is defined as a two-dimensional structure in which each element in the sequence is associated with a weight. A series of network events, for instance, is a weighted-sequence because each event is associated with a timestamp. Querying a large sequence database by events' occurrence patterns is a first step towards understanding the temporal causal relationships among the events. The index structure proposed herein enables the efficient retrieval from the database of all subsequences (contiguous and non-contiguous) that match a given query sequence both by events and by weights. The index structure also takes into consideration the nonuniform frequency distribution of events in the sequence data.

    摘要翻译: 本发明提供了一种用于在大数据库中管理加权序列的索引结构。 加权序列被定义为二维结构,其中序列中的每个元素与权重相关联。 例如,一系列网络事件是加权序列,因为每个事件都与时间戳相关联。 通过事件发生模式查询大序列数据库是了解事件之间的时间因果关系的第一步。 这里提出的索引结构使得能够通过事件和权重从数据库有效地检索与给定查询序列匹配的所有子序列(连续的和不连续的)。 索引结构还考虑了序列数据中事件的不均匀频率分布。

    Identifying and annotating shared hierarchical markup document trees
    52.
    发明授权
    Identifying and annotating shared hierarchical markup document trees 失效
    识别和注释共享分层标记文档树

    公开(公告)号:US08108765B2

    公开(公告)日:2012-01-31

    申请号:US11548325

    申请日:2006-10-11

    IPC分类号: G06F17/00

    CPC分类号: G06F17/30923

    摘要: Disclosed are a method, information processing system, and a computer readable medium for managing documents. The method includes analyzing a plurality of hierarchical markup documents, wherein each hierarchical markup document is representable by a hierarchical tree structure. A shared hierarchical markup document associated with the plurality of hierarchical markup documents is generated based on the analyzing. Each hierarchical markup document in the plurality of hierarchical markup documents is compared with the shared hierarchical document. A plurality of difference hierarchical markup documents is generated based on the comparing.

    摘要翻译: 公开了一种用于管理文件的方法,信息处理系统和计算机可读介质。 该方法包括分析多个分层标记文档,其中每个分层标记文档可由分层树结构表示。 基于分析生成与多个分层标记文档相关联的共享分层标记文档。 将多个分层标记文档中的每个分层标记文档与共享分层文档进行比较。 基于比较生成多个不同的分层标记文档。

    Integrity assurance of query result from database service provider
    53.
    发明授权
    Integrity assurance of query result from database service provider 有权
    数据库服务提供商的查询结果的完整性保证

    公开(公告)号:US07870398B2

    公开(公告)日:2011-01-11

    申请号:US11626847

    申请日:2007-01-25

    IPC分类号: G06F12/14 G06F7/00

    摘要: A method, system and computer program product for confirming the validity of data returned from a data store. A data store contains a primary data set encrypted using a first encryption and a secondary data set using a second encryption. The secondary data set is a subset of the primary data set. A client issues a substantive query against the data store to retrieve a primary data result belonging to the primary data set. A query interface issues at least one validating query against the data store. Each validating query returns a secondary data result belonging to the secondary data set. The query interface receives the secondary data result and provides a data invalid notification if data satisfying the substantive query included in an unencrypted form of the secondary data result is not contained in an unencrypted form of the primary data result.

    摘要翻译: 一种用于确认从数据存储返回的数据的有效性的方法,系统和计算机程序产品。 数据存储包含使用第一加密加密的主数据集和使用第二加密的辅数据集。 辅助数据集是主数据集的子集。 客户端对数据存储器发出实质性查询以检索属于主数据集的主数据结果。 查询界面对数据存储区发出至少一个验证查询。 每个验证查询返回属于辅助数据集的辅助数据结果。 如果满足辅助数据结果的未加密形式的实质性查询的数据未包含在主数据结果的未加密形式中,则查询接口接收辅助数据结果并提供数据无效通知。

    Systems and methods for sequential modeling in less than one sequential scan
    54.
    发明授权
    Systems and methods for sequential modeling in less than one sequential scan 失效
    在不到一次顺序扫描中进行顺序建模的系统和方法

    公开(公告)号:US07822730B2

    公开(公告)日:2010-10-26

    申请号:US11931129

    申请日:2007-10-31

    IPC分类号: G06F17/30

    CPC分类号: G06N99/005 Y10S707/99931

    摘要: Most recent research of scalable inductive learning on very large streaming dataset focuses on eliminating memory constraints and reducing the number of sequential data scans. However, state-of-the-art algorithms still require multiple scans over the data set and use sophisticated control mechanisms and data structures. There is discussed herein a general inductive learning framework that scans the dataset exactly once. Then, there is proposed an extension based on Hoeffding's inequality that scans the dataset less than once. The proposed frameworks are applicable to a wide range of inductive learners.

    摘要翻译: 对最大流式数据集的可伸缩归纳学习的最新研究着重于消除记忆限制并减少顺序数据扫描的次数。 然而,最先进的算法仍然需要对数据集进行多次扫描,并使用复杂的控制机制和数据结构。 这里讨论了一般的归纳学习框架,该框架一次扫描数据集。 然后,提出了一种基于Hoeffding不等式的扩展,可以扫描数据集不止一次。 提出的框架适用于广泛的归纳学习者。

    System and method for learning models from scarce and skewed training data
    55.
    发明授权
    System and method for learning models from scarce and skewed training data 失效
    从稀缺和倾斜的训练数据中学习模型的系统和方法

    公开(公告)号:US07630950B2

    公开(公告)日:2009-12-08

    申请号:US11506226

    申请日:2006-08-18

    IPC分类号: G06F17/00 G06N5/02

    CPC分类号: G06N99/005

    摘要: A system and method for learning models from scarce and/or skewed training data includes partitioning a data stream into a sequence of time windows. A most likely current class distribution to classify portions of the data stream is determined based on observing training data in a current time window and based on concept drift probability patterns using historical information.

    摘要翻译: 用于从稀缺和/或倾斜的训练数据学习模型的系统和方法包括将数据流划分成时间窗口序列。 基于在当前时间窗口中观察训练数据并且基于使用历史信息的概念漂移概率模式来确定对数据流的部分进行分类的最可能的当前类别分布。

    System and Method for Scalable Processing of Multi-Way Data Stream Correlations
    56.
    发明申请
    System and Method for Scalable Processing of Multi-Way Data Stream Correlations 失效
    用于多路数据流相关性的可扩展处理的系统和方法

    公开(公告)号:US20090248749A1

    公开(公告)日:2009-10-01

    申请号:US12478627

    申请日:2009-06-04

    IPC分类号: G06F17/30 G06F15/16

    摘要: A computer implemented method, apparatus, and computer usable program code for processing multi-way stream correlations. Stream data are received for correlation. A task is formed for continuously partitioning a multi-way stream correlation workload into smaller workload pieces. Each of the smaller workload pieces may be processed by a single host. The stream data are sent to different hosts for correlation processing.

    摘要翻译: 一种用于处理多路流相关性的计算机实现的方法,装置和计算机可用程序代码。 接收流数据进行相关。 形成一个任务,用于将多路流相关工作负载连续划分成较小的工作负载。 每个较小的工作负载片段可以由单个主机处理。 流数据被发送到不同的主机进行相关处理。

    SYSTEM AND METHOD FOR LOAD SHEDDING IN DATA MINING AND KNOWLEDGE DISCOVERY FROM STREAM DATA
    57.
    发明申请
    SYSTEM AND METHOD FOR LOAD SHEDDING IN DATA MINING AND KNOWLEDGE DISCOVERY FROM STREAM DATA 有权
    用于数据挖掘中的负载分解和来自流数据的知识发现的系统和方法

    公开(公告)号:US20090187914A1

    公开(公告)日:2009-07-23

    申请号:US12372568

    申请日:2009-02-17

    IPC分类号: G06F9/46 G06N5/02

    CPC分类号: G06K9/6297 H04L43/028

    摘要: Load shedding schemes for mining data streams. A scoring function is used to rank the importance of stream elements, and those elements with high importance are investigated. In the context of not knowing the exact feature values of a data stream, the use of a Markov model is proposed herein for predicting the feature distribution of a data stream. Based on the predicted feature distribution, one can make classification decisions to maximize the expected benefits. In addition, there is proposed herein the employment of a quality of decision (QoD) metric to measure the level of uncertainty in decisions and to guide load shedding. A load shedding scheme such as presented herein assigns available resources to multiple data streams to maximize the quality of classification decisions. Furthermore, such a load shedding scheme is able to learn and adapt to changing data characteristics in the data streams.

    摘要翻译: 挖掘数据流的加载脱落方案。 使用评分函数对流元素的重要性进行排序,并调查那些具有重要意义的元素。 在不知道数据流的精确特征值的上下文中,本文提出了使用马尔可夫模型来预测数据流的特征分布。 基于预测的特征分布,可以进行分类决定,以最大限度地提高预期效益。 此外,在此提出采用质量决策(QoD)度量来衡量决策中的不确定性水平并指导负荷脱落。 诸如此处呈现的负载脱落方案将可用资源分配给多个数据流以最大化分类决定的质量。 此外,这种负载脱落方案能够学习和适应数据流中不断变化的数据特性。

    METHOD AND SYSTEM FOR COMBINING RANKING AND CLUSTERING IN A DATABASE MANAGEMENT SYSTEM
    58.
    发明申请
    METHOD AND SYSTEM FOR COMBINING RANKING AND CLUSTERING IN A DATABASE MANAGEMENT SYSTEM 审中-公开
    在数据库管理系统中组合排序和聚类的方法和系统

    公开(公告)号:US20080270374A1

    公开(公告)日:2008-10-30

    申请号:US11740090

    申请日:2007-04-25

    IPC分类号: G06F17/30

    CPC分类号: G06F16/24558

    摘要: A system for combining ranking and clustering in a query. Bit vectors are intersected on Boolean attributes resulting in a vector. Two summary grids are constructed by intersecting bit vectors on clustering and ranking attributes. The vector is intersected with each summary grid to obtain a filtered clustering and ranking grid. An algorithm is applied on the clustering grid to obtain clusters. Vectors associated with buckets in the clusters are intersected resulting in one vector for each cluster. The vector corresponding to each cluster is intersected with the ranking grid to obtain a modified grid. Buckets are pruned according to bounds of each bucket in the modified grid and a predetermined number to obtain candidate buckets containing the predetermined number of data. The data are retrieved and a ranking score is calculated. The top predetermined number of data are sorted according to ranking scores and a result is returned.

    摘要翻译: 用于在查询中组合排名和聚类的系统。 位向量在布尔属性上相交,导致一个向量。 通过在聚类和排名属性上相交位向量来构建两个概要网格。 向量与每个摘要网格相交,以获得过滤的聚类和排序网格。 在聚类网格上应用一种算法来获得聚类。 与群集中的桶相关联的向量相交,导致每个簇的一个向量。 对应于每个簇的向量与排名网格相交,以获得修改的网格。 根据修改的网格中的每个桶的边界和预定数量来修剪桶,以获得包含预定数量的数据的候选桶。 检索数据并计算排名分数。 最高预定数量的数据根据​​排名得分排序并返回结果。

    SYSTEM AND METHOD OF MINING TIME-CHANGING DATA STREAMS USING A DYNAMIC RULE CLASSIFIER HAVING LOW GRANULARITY
    59.
    发明申请
    SYSTEM AND METHOD OF MINING TIME-CHANGING DATA STREAMS USING A DYNAMIC RULE CLASSIFIER HAVING LOW GRANULARITY 失效
    使用具有低精度的动态规则分类器来采集时变数据流的系统和方法

    公开(公告)号:US20080222060A1

    公开(公告)日:2008-09-11

    申请号:US12121942

    申请日:2008-05-16

    IPC分类号: G06F15/18

    CPC分类号: G06N5/025

    摘要: A dynamic rule classifier for mining a data stream includes at least one window for viewing data contained in the data stream and a set of rules for mining the data. Rules are added and the set of rules are updated by algorithms when an drift in a concept within the data occurs, causing unacceptable drops in classification accuracy. The dynamic rule classifier is also implemented as a method and a computer program product.

    摘要翻译: 用于挖掘数据流的动态规则分类器包括用于查看数据流中包含的数据的至少一个窗口和用于挖掘数据的一组规则。 添加规则,并且当数据中的概念中的漂移发生时,通过算法更新规则集合,导致分类准确性的不可接受的下降。 动态规则分类器也被实现为一种方法和一种计算机程序产品。

    QUERYING DATA AND AN ASSOCIATED ONTOLOGY IN A DATABASE MANAGEMENT SYSTEM
    60.
    发明申请
    QUERYING DATA AND AN ASSOCIATED ONTOLOGY IN A DATABASE MANAGEMENT SYSTEM 审中-公开
    在数据库管理系统中查询数据和相关的本体

    公开(公告)号:US20080172360A1

    公开(公告)日:2008-07-17

    申请号:US11623952

    申请日:2007-01-17

    IPC分类号: G06F7/06

    摘要: A method, apparatus, and computer program product for querying data in a database. An ontology is associated with the data in the database. A query containing a query predicate is received. The query predicate is expanded using implications from the ontology to form a modified query. The modified query is rewritten to include subsumption checking.

    摘要翻译: 一种用于查询数据库中的数据的方法,装置和计算机程序产品。 本体与数据库中的数据相关联。 收到包含查询谓词的查询。 使用本体的含义扩展查询谓词以形成修改的查询。 被修改的查询被重写以包括包含检查。