Detecting duplicate records in databases
    93.
    发明申请
    Detecting duplicate records in databases 有权
    检测数据库中的重复记录

    公开(公告)号:US20050262044A1

    公开(公告)日:2005-11-24

    申请号:US11182590

    申请日:2005-07-14

    CPC classification number: G06F17/30303 Y10S707/99931 Y10S707/99942

    Abstract: The invention concerns a detection of duplicate tuples in a database. Previous domain independent detection of duplicated tuples relied on standard similarity functions (e.g., edit distance, cosine metric) between multi-attribute tuples. However, such prior art approaches result in large numbers of false positives if they are used to identify domain-specific abbreviations and conventions. In accordance with the invention a process for duplicate detection is implemented based on interpreting records from multiple dimensional tables in a data warehouse, which are associated with hierarchies specified through key-foreign key relationships in a snowflake schema. The invention exploits the extra knowledge available from the table hierarchy to develop a high quality, scalable duplicate detection process.

    Abstract translation: 本发明涉及对数据库中的重复元组的检测。 复制元组的先前的域独立检测依赖于多属性元组之间的标准相似度函数(例如,编辑距离,余弦度量)。 然而,如果这些现有技术的方法用于识别领域特定的缩写和惯例,则会产生大量的假阳性。 根据本发明,基于解释数据仓库中来自多个维度表的记录来实现重复检测的过程,数据仓库与通过雪花模式中的关键 - 外键关系指定的层次相关联。 本发明利用表层次结构中可用的额外知识来开发高质量,可扩展的重复检测过程。

    Primitives for workload summarization
    94.
    发明申请
    Primitives for workload summarization 有权
    用于工作负载摘要的基元

    公开(公告)号:US20050223026A1

    公开(公告)日:2005-10-06

    申请号:US10815061

    申请日:2004-03-31

    Abstract: A database object summarization tool is provided that selects a subset of database objects subject to filtering constraints such as a partial order or optimization of some attribute. A dominance primitive filters out tuples that are dominated according to a partial order constraint by another tuple. A representation primitive selects a representative subset of tuples such than an optimization criteria is met.

    Abstract translation: 提供了一种数据库对象摘要工具,该工具选择受过滤约束(如某些属性的部分顺序或优化)的数据库对象的子集。 优势原语过滤掉由另一个元组根据部分顺序约束所主导的元组。 表示基元选择满足优化标准的元组的代表性子集。

    Method and apparatus for exploiting statistics on query expressions for optimization

    公开(公告)号:US06947927B2

    公开(公告)日:2005-09-20

    申请号:US10191822

    申请日:2002-07-09

    Abstract: A method for evaluating a user query on a relational database having records stored therein, a workload made up of a set of queries that have been executed on the database, and a query optimizer that generates a query execution plan for the user query. Each query plan includes a plurality of intermediate query plan components that verify a subset of records from the database meeting query criteria. The method accesses the query plan and a set of stored intermediate statistics for records verified by query components, such as histograms that summarize the cardinality of the records that verify the query component. The method forms a transformed query plan based on the selected intermediate statistics (possibly by rewriting the query plan) and estimates the cardinality of the transformed query plan to arrive at a more accurate cardinality estimate for the query. If additional intermediate statistics are necessary, a pool of intermediate statistics may be generated based on the queries in the workload by evaluating the benefit of a given statistic over the workload and adding intermediate statistics to the pool that provide relatively great benefit.

    Database monitoring system
    96.
    发明申请
    Database monitoring system 有权
    数据库监控系统

    公开(公告)号:US20050192921A1

    公开(公告)日:2005-09-01

    申请号:US10788077

    申请日:2004-02-26

    Abstract: A framework is provided within a database system for specifying database monitoring rules that will be evaluated as part of the execution code path of database events being monitored. The occurrence of a selected database event triggers a rule that evaluates some parameter of an object related to the event against a condition in the rule. If the condition is met, a specified action is taken that can alter the execution of the database event or database system performance. Lightweight aggregation tables are utilized to enable aggregation of object parameter values so that presently occurring events can be compared to a summary of the object parameter values from previously occurring database events. Signatures are assigned to queries based on the structure of the query plan so that information in the lightweight aggregation tables can be grouped according to query signature.

    Abstract translation: 在数据库系统中提供一个框架,用于指定数据库监视规则,该规则将作为被监视的数据库事件的执行代码路径的一部分进行评估。 所选数据库事件的发生触发一个规则,该规则根据规则中的条件来评估与事件相关的对象的某些参数。 如果满足条件,则采取可以改变数据库事件或数据库系统性能执行的指定操作。 轻量级聚合表用于启用对象参数值的聚合,以便将当前发生的事件与先前发生的数据库事件的对象参数值的摘要进行比较。 根据查询计划的结构将签名分配给查询,以便轻量级聚合表中的信息可以根据查询签名进行分组。

    Compressing database workloads
    97.
    发明授权
    Compressing database workloads 有权
    压缩数据库工作负载

    公开(公告)号:US06912547B2

    公开(公告)日:2005-06-28

    申请号:US10180667

    申请日:2002-06-26

    Abstract: Relational database applications such as index selection, histogram tuning, approximate query processing, and statistics selection have recognized the importance of leveraging workloads. Often these applications are presented with large workloads, i.e., a set of SQL DML statements, as input. A key factor affecting the scalability of such applications is the size of the workload. The invention concerns workload compression which helps improve the scalability of such applications. The exemplary embodiment is broadly applicable to a variety of workload-driven applications, while allowing for incorporation of application specific knowledge. The process is described in detail in the context of two workload-driven applications: index selection and approximate query processing.

    Abstract translation: 诸如索引选择,直方图调整,近似查询处理和统计选择等关系数据库应用程序已经认识到利用工作负载的重要性。 通常,这些应用程序具有大的工作负载,即一组SQL DML语句作为输入。 影响这些应用程序可扩展性的关键因素是工作负载的大小。 本发明涉及工作负载压缩,这有助于提高这种应用的可扩展性。 该示例性实施例广泛地适用于各种工作负载驱动的应用,同时允许结合应用特定的知识。 该过程在两个工作负载驱动的应用程序的上下文中进行了详细描述:索引选择和近似查询处理。

    Generalized keyword matching for keyword based searching over relational databases
    98.
    发明授权
    Generalized keyword matching for keyword based searching over relational databases 有权
    通过关键字搜索关系数据库的广义关键词匹配

    公开(公告)号:US06792414B2

    公开(公告)日:2004-09-14

    申请号:US10036348

    申请日:2001-10-19

    Abstract: Searching by keywords and providing generalized matching capabilities on a relational database is enabled by performing preprocessing operations to construct inverted list lookup tables based on data record components at an interim level of granularity, such as column location. Prefix information is in the inverted list stored for each keyword, keyword sub-string, or stemmed version of the keyword. A keyword search is performed on the lookup tables rather than the database tables to determine database column locations of the keyword. The lookup tables is scanned to identify each prefix associated with the search term. Schema information about the database is used to link the column locations to form database subgraphs that span the keywords. Join tables are to generated based on the subgraphs consisting of columns containing the keywords. A query on the database is generated to join the tables and retrieve database rows that contain the keyword and the prefixes associated with the keyword. The retrieved rows are ranked in order of relevance before being output. By preprocessing a relational database to form lookup tables, and initially searching the lookup tables to obtain a targeted subset of the database upon which SQL queries can be performed to collect data records, keyword searching on relational database is made efficient.

    Abstract translation: 通过关键字搜索和在关系数据库上提供广义匹配功能,可以通过执行预处理操作,以基于数据记录组件的临时级别(如列位置)构建反向列表查找表。 前缀信息位于每个关键字,关键字子字符串或关键字的主题版本中存储的反向列表中。 对查找表而不是数据库表执行关键字搜索,以确定关键字的数据库列位置。 扫描查找表以识别与搜索项相关联的每个前缀。 关于数据库的模式信息用于链接列位置以形成跨越关键字的数据库子图。 根据由包含关键字的列组成的子图生成连接表。 生成关于数据库的查询以连接表并检索包含与关键字关联的关键字和前缀的数据库行。 检索到的行在输出之前按照相关性的顺序排列。 通过预处理关系数据库以形成查找表,并且最初搜索查找表以获得数据库的目标子集,可以执行SQL查询来收集数据记录,关系数据库上的关键字搜索是有效的。

    Self-tuning histogram and database modeling
    99.
    发明授权
    Self-tuning histogram and database modeling 有权
    自调整直方图和数据库建模

    公开(公告)号:US06460045B1

    公开(公告)日:2002-10-01

    申请号:US09268589

    申请日:1999-03-15

    Abstract: Building histograms by using feedback information about the execution of query workload rather than by examining the data helps reduce the cost of building and maintaining histograms. A method of maintaining self-tuning histograms updates histograms based on feedback about the execution of a user query. A histogram may be initialized using an assumption of uniform distribution of data or by combining existing histograms. A histogram tuner accesses and estimated result in response to a user query generated by using the histogram. The histogram tuner calculates an estimation error based on the result of the user query and the estimated result. The frequencies of histogram buckets are refined based on the estimation error. The bucket bounds of the histogram are restructured based on the refined frequencies. The method may be performed on-line after a user query or off-line by accessing a workload log. By updating a histogram without accessing the database, the cost of building and maintaining histograms is significantly reduced.

    Abstract translation: 通过使用有关执行查询工作负载的反馈信息而不是检查数据来构建直方图有助于降低构建和维护直方图的成本。 维持自调整直方图的方法基于关于用户查询的执行的反馈来更新直方图。 可以使用数据均匀分布的假设或通过组合现有直方图来初始化直方图。 直方图调谐器响应于通过使用直方图生成的用户查询来访问和估计结果。 直方图调谐器基于用户查询的结果和估计结果来计算估计误差。 基于估计误差来改进直方图桶的频率。 直方图的边界根据精细的频率进行重组。 该方法可以在用户查询之后在线执行,或者通过访问工作负载日志离线执行。 通过更新直方图而不访问数据库,建立和维护直方图的成本显着降低。

    What-if index analysis utility for database systems
    100.
    发明授权
    What-if index analysis utility for database systems 有权
    数据库系统的假设索引分析实用程序

    公开(公告)号:US06223171B1

    公开(公告)日:2001-04-24

    申请号:US09139843

    申请日:1998-08-25

    Abstract: What-if index analysis utility provides the ability to analyze the performance of the existing configuration of a database system with respect to one or more workloads of queries and to propose a hypothetical configuration for the database system to analyze its potential impact on the performance of the database system. The utility may be used, for example, to perform an impact analysis of the set of indexes selected by an index selection tool, for example, with respect to a workload of queries and may also be used to explore what-if scenarios for the database system by analyzing the impact of hypothetical sets of indexes with respect to the execution of various workloads over projected sizes of a database. The utility may be used to perform summarizations of workloads, configurations, and the performance of workloads with respect to the existing configuration and hypothetical configurations. What-if index analysis utility may be used, for example, by a database administrator or a physical database design tool to help improve performance of a database system.

    Abstract translation: 假设索引分析实用程序提供了分析数据库系统对一个或多个查询工作负载的现有配置的性能的能力,并提出数据库系统的假设配置,以分析其对性能的潜在影响 数据库系统。 例如,该实用程序可以用于对由索引选择工具选择的索引集合进行影响分析,例如关于查询的工作负载,并且还可以用于探索数据库的假设情况 系统通过分析假设的索引集合对各种工作负载的执行与数据库的预计大小的影响。 该实用程序可用于执行相对于现有配置和假设配置的工作负载,配置和工作负载性能的摘要。 假设索引分析实用程序可以由数据库管理员或物理数据库设计工具使用,以帮助提高数据库系统的性能。

Patent Agency Ranking