Apparatus and method for correlating synchronous and asynchronous data streams
    2.
    发明授权
    Apparatus and method for correlating synchronous and asynchronous data streams 有权
    用于关联同步和异步数据流的装置和方法

    公开(公告)号:US08131792B1

    公开(公告)日:2012-03-06

    申请号:US12125973

    申请日:2008-05-23

    CPC分类号: G06K9/00536

    摘要: Certain exemplary embodiments provide a method comprising: automatically: receiving a plurality of elements for each of a plurality of continuous data streams; treating the plurality of elements as a first data stream matrix that defines a first dimensionality; reducing the first dimensionality of the first data stream matrix to obtain a second data stream matrix; computing a singular value decomposition of the second data stream matrix; and based on the singular value decomposition of the second data stream matrix, quantifying approximate linear correlations between the plurality of elements.

    摘要翻译: 某些示例性实施例提供了一种方法,包括:自动地:接收多个连续数据流中的每一个的多个元素; 将所述多个元素作为限定第一维度的第一数据流矩阵; 减少第一数据流矩阵的第一维度以获得第二数据流矩阵; 计算第二数据流矩阵的奇异值分解; 并且基于第二数据流矩阵的奇异值分解,量化多个元素之间的近似线性相关性。

    Apparatus and method for correlating synchronous and asynchronous data streams
    3.
    发明授权
    Apparatus and method for correlating synchronous and asynchronous data streams 有权
    用于关联同步和异步数据流的装置和方法

    公开(公告)号:US07437397B1

    公开(公告)日:2008-10-14

    申请号:US10822316

    申请日:2004-04-12

    IPC分类号: G06F17/15

    CPC分类号: G06K9/00536

    摘要: Certain exemplary embodiments provide a method comprising: automatically: receiving a plurality of elements for each of a plurality of continuous data streams; treating the plurality of elements as a first data stream matrix that defines a first dimensionality; reducing the first dimensionality of the first data stream matrix to obtain a second data stream matrix; computing a singular value decomposition of the second data stream matrix; and based on the singular value decomposition of the second data stream matrix, quantifying approximate linear correlations between the plurality of elements.

    摘要翻译: 某些示例性实施例提供了一种方法,包括:自动地:接收多个连续数据流中的每一个的多个元素; 将所述多个元素作为限定第一维度的第一数据流矩阵; 减少第一数据流矩阵的第一维度以获得第二数据流矩阵; 计算第二数据流矩阵的奇异值分解; 并且基于第二数据流矩阵的奇异值分解,量化多个元素之间的近似线性相关性。

    Method and system for performing queries on data streams
    5.
    发明授权
    Method and system for performing queries on data streams 有权
    对数据流执行查询的方法和系统

    公开(公告)号:US07904444B1

    公开(公告)日:2011-03-08

    申请号:US11411478

    申请日:2006-04-26

    IPC分类号: G06F7/00

    CPC分类号: G06F17/30516 Y10S707/922

    摘要: A method and system for performing a data stream query. A data stream query requiring a join operation on multiple data streams is approximated without performing the join operation. It is determined whether conditions of the query are proper to accurately approximate the join operation, and if the conditions are proper the join operation is approximated. The join operation is approximated by independently aggregating values of the data streams and comparing the independently aggregated values.

    摘要翻译: 一种用于执行数据流查询的方法和系统。 在不执行连接操作的情况下,近似需要在多个数据流上进行连接操作的数据流查询。 确定查询的条件是否适合准确地近似连接操作,并且如果条件合适,则接近操作被近似。 通过独立地聚合数据流的值并比较独立的聚合值来近似加入操作。

    Routing XML queries
    6.
    发明授权
    Routing XML queries 失效
    路由XML查询

    公开(公告)号:US07664806B1

    公开(公告)日:2010-02-16

    申请号:US10830285

    申请日:2004-04-22

    IPC分类号: G06F7/00 G06F15/16

    CPC分类号: G06F17/30929 G06F17/30545

    摘要: A vast amount of information currently accessible over the Web, and in corporate networks, is stored in a variety of databases, and is being exported as XML data. However, querying this totality of information in a declarative and timely fashion is problematic because this set of databases is dynamic, and a common schema is difficult to maintain. The present invention provides a solution to the problem of issuing declarative, ad hoc XPath queries against such a dynamic collection of XML databases, and receiving timely answers. There is proposed a decentralized architectures, under the open and the agreement cooperation models between a set of sites, for processing queries and updates to XML data. Each site consists of XML data nodes. (which export their data as XML, and also pose queries) and one XML router node (which manages the query and update interactions between sites). The architectures differ in the degree of knowledge individual router nodes have about data nodes containing specific XML data. There is therefore provided a method for accessing data over a wide area network comprising: providing a decentralized architecture comprising a plurality of data nodes each having a database, a query processor and a path index, and a plurality of router nodes each having a routing state, maintaining a routing state in each of the router nodes, broadcasting routing state updates from each of the databases to the router nodes, routing path queries to each of the databases by accessing the routing state.

    摘要翻译: 目前可以通过Web和企业网络访问的大量信息存储在各种数据库中,并作为XML数据导出。 然而,以声明和及时的方式查询这些信息是有问题的,因为这组数据库是动态的,并且常见的模式很难维护。 本发明提供了解决针对XML数据库的这种动态集合发出声明性特征XPath查询并及时接收答案的问题的解决方案。 提出了一种分散架构,在一组网站之间的开放协议合作模式下,用于处理查询和更新XML数据。 每个站点由XML数据节点组成。 (它们以XML格式导出数据,并提供查询)和一个XML路由器节点(管理查询和更新站点之间的交互)。 各种路由器节点对包含特定XML数据的数据节点的知识程度不同。 因此,提供了一种用于通过广域网访问数据的方法,包括:提供分散式架构,其包括多个数据节点,每个数据节点具有数据库,查询处理器和路径索引,以及多个路由器节点,每个节点具有路由状态 在每个路由器节点中保持路由状态,从每个数据库向路由器节点广播路由状态更新,通过访问路由状态将路由查询路由到每个数据库。

    Text joins for data cleansing and integration in a relational database management system
    7.
    发明申请
    Text joins for data cleansing and integration in a relational database management system 审中-公开
    文本连接用于关系数据库管理系统中的数据清理和集成

    公开(公告)号:US20050027717A1

    公开(公告)日:2005-02-03

    申请号:US10828819

    申请日:2004-04-21

    IPC分类号: G06F7/02 G06F17/30

    摘要: An organization's data records are often noisy: because of transcription errors, incomplete information, and lack of standard formats for textual data. A fundamental task during data cleansing and integration is matching strings—perhaps across multiple relations—that refer to the same entity (e.g., organization name or address). Furthermore, it is desirable to perform this matching within an RDBMS, which is where the data is likely to reside. In this paper, We adapt the widely used and established cosine similarity metric from the information retrieval field to the relational database context in order to identify potential string matches across relations. We then use this similarity metric to characterize this key aspect of data cleansing and integration as a join between relations on textual attributes, where the similarity of matches exceeds a specified threshold. Computing an exact answer to the text join can be expensive. For query processing efficiency, we propose an approximate, sampling-based approach to the join problem that can be easily and efficiently executed in a standard, unmodified RDBMS. Therefore the present invention includes a system for string matching across multiple relations in a relational database management system comprising generating a set of strings from a set of characters, decomposing each string into a subset of tokens, establishing at least two relations within the strings, establishing a similarity threshold for the relations, sampling the at least two relations, correlating the relations for the similarity threshold and returning all of the tokens which meet the criteria of the similarity threshold.

    摘要翻译: 组织的数据记录通常是嘈杂的:因为转录错误,信息不完整以及文本数据的标准格式不足。 在数据清理和集成过程中,一个基本任务是匹配字符串(可能是跨多个关系),它们指的是同一个实体(例如,组织名称或地址)。 此外,希望在数据可能驻留的RDBMS内执行该匹配。 在本文中,我们将广泛使用和建立的余弦相似性度量从信息检索领域适应到关系数据库上下文,以便识别跨关系的潜在字符串匹配。 然后,我们使用这种相似性度量来表征数据清理和集成的这个关键方面,作为文本属性之间的关系之间的连接,其中匹配的相似性超过了指定的阈值。 计算文本连接的确切答案可能是昂贵的。 对于查询处理效率,我们提出了一种基于抽样的近似方法,可以在标准的未修改的RDBMS中轻松有效地执行连接问题。 因此,本发明包括一种用于在关系数据库管理系统中跨多个关系进行字符串匹配的系统,包括从一组字符生成一组字符串,将每个字符串分解为令牌子集,建立字符串内的至少两个关系,建立 关系的相似性阈值,对至少两个关系进行采样,将相似性阈值的关系相关联并返回满足相似性阈值的标准的所有令牌。

    System, method and computer-readable medium for providing pattern matching
    8.
    发明授权
    System, method and computer-readable medium for providing pattern matching 有权
    用于提供模式匹配的系统,方法和计算机可读介质

    公开(公告)号:US07895194B2

    公开(公告)日:2011-02-22

    申请号:US12615805

    申请日:2009-11-10

    IPC分类号: G06F17/30

    摘要: A system, method and computer-readable medium are disclosed for identifying representative data using sketches. The method embodiment comprises generating a plurality of vectors from a data set, modifying each of the vectors of the plurality of vectors and selecting one of the plurality of generated vectors according to a comparison of a summed distance between a modified vector associated with the selected generated vector and remaining modified vectors. Modifying the generated vectors may involve reduced each generated vector to a lower dimensional vector. The summed distance then represents a summed distance between the lower dimensional vector and remaining lower dimensional vectors.

    摘要翻译: 公开了一种使用草图识别代表性数据的系统,方法和计算机可读介质。 方法实施例包括从数据集生成多个向量,修改多个向量中的每一个向量,并且根据与所选生成的所生成的向量相关联的修改向量之间的总和距离的比较来选择多个生成向量中的一个 载体和剩余的修饰载体。 修改所生成的向量可以将每个生成的向量减少到较低维度向量。 然后,求和距离表示较低维度向量和剩余的较低维度向量之间的总和距离。

    System, method and computer-readable medium for providing pattern matching
    10.
    发明授权
    System, method and computer-readable medium for providing pattern matching 有权
    用于提供模式匹配的系统,方法和计算机可读介质

    公开(公告)号:US07415464B1

    公开(公告)日:2008-08-19

    申请号:US11185091

    申请日:2005-07-20

    IPC分类号: G06F17/30

    摘要: A system, method and computer-readable medium are disclosed for identifying representative data using sketches. The method embodiment comprises generating a plurality of vectors from a data set, modifying each of the vectors of the plurality of vectors and selecting one of the plurality of generated vectors according to a comparison of a summed distance between a modified vector associated with the selected generated vector and remaining modified vectors. Modifying the generated vectors may involve reduced each generated vector to a lower dimensional vector. The summed distance then represents a summed distance between the lower dimensional vector and remaining lower dimensional vectors.

    摘要翻译: 公开了一种使用草图识别代表性数据的系统,方法和计算机可读介质。 方法实施例包括从数据集生成多个向量,修改多个向量中的每一个向量,并且根据与所选生成的所生成的向量相关联的修改向量之间的总和距离的比较来选择多个生成向量中的一个 载体和剩余的修饰载体。 修改所生成的向量可以将每个生成的向量减少到较低维度向量。 然后,求和距离表示较低维度向量和剩余的较低维度向量之间的总和距离。