Duplicate data elimination system
    21.
    发明授权
    Duplicate data elimination system 有权
    重复数据消除系统

    公开(公告)号:US07287019B2

    公开(公告)日:2007-10-23

    申请号:US10453992

    申请日:2003-06-04

    Abstract: A process for finding a similar data records from a set of data records. A database table or tables provide a number of data records from which one or more canonical data records are identified. Tokens are identified within the data records and classified according to attribute field. A similarity score is assigned to data records in relation to other data records based on a similarity between tokens of the data records. Data records whose similarity score with respect to each other is greater than a threshold form one or more groups of data records. The records or tuples form nodes of a graph wherein edges between nodes represent a similarity score between records of a group. Within each group a canonical record is identified based on the similarity of data records to each other within the group.

    Abstract translation: 从一组数据记录中查找类似数据记录的过程。 数据库表或表提供了一些数据记录,从中可以识别一个或多个规范数据记录。 在数据记录中识别令牌,并根据属性字段进行分类。 基于数据记录的令牌之间的相似度,将相似性得分分配给与其他数据记录有关的数据记录。 其相似度相对于彼此的数据记录大于阈值形成一组或多组数据记录。 记录或元组形成图的节点,其中节点之间的边表示组之间的相似性得分。 在每个组内,基于数据记录在组内的彼此的相似性来识别规范记录。

    Techniques for estimating progress of database queries
    23.
    发明申请
    Techniques for estimating progress of database queries 有权
    估计数据库查询进度的技术

    公开(公告)号:US20060282404A1

    公开(公告)日:2006-12-14

    申请号:US11149968

    申请日:2005-06-10

    Abstract: Techniques for estimating the progress of database queries are described herein. In a first implementation, a respective lower-bound parameter is associated with each node in an operator tree that representing a given database query, and the progress of the database query at a given point is estimated based upon the lower-bound parameters. In a second implementation, the progress of the query is estimated by associating respective lower-bound and upper-bound parameters with each node in the operator tree. The progress of the query at the given point is then estimated based on the lower-bound and upper-bound parameters. The progress estimate is computed by dividing the work done so far by the sums of the above averages for each node in the tree.

    Abstract translation: 本文描述了用于估计数据库查询的进度的技术。 在第一实现中,相应的下限参数与表示给定数据库查询的运算符树中的每个节点相关联,并且基于下限参数来估计给定点处的数据库查询的进度。 在第二个实现中,通过将相应的下限和上限参数与运算符树中的每个节点相关联来估计查询的进度。 然后,基于下限和上限参数估计给定点处的查询进度。 进度估计是通过将迄今为止完成的工作除以树中每个节点的上述平均值的总和来计算的。

    String predicate selectivity estimation
    24.
    发明授权
    String predicate selectivity estimation 失效
    字符串谓词选择性估计

    公开(公告)号:US07149735B2

    公开(公告)日:2006-12-12

    申请号:US10603035

    申请日:2003-06-24

    CPC classification number: G06F17/30985 Y10S707/99936

    Abstract: A method of estimating selectivity of a given string predicate in a database query. In the method selectivities of substrings of various substring lengths are estimated. For example, the selectivity of substrings between length l (or some constant q) to the length of the given string predicate may be estimated. The method then selects a candidate sub string for each sub string length based on estimated selectivities of the substrings. The estimated selectivities of the candidate substrings are combined. The combined estimated selectivity of the candidate substrings is returned as the estimated selectivity of the given string predicate.

    Abstract translation: 在数据库查询中估计给定字符串谓词的选择性的方法。 在方法中,估计各种子串长度的子串的选择性。 例如,可以估计长度l(或一些常数q)与给定字符串谓词的长度之间的子串的选择性。 然后,该方法基于所估计的子串的选择性来选择每个子串长度的候选子串。 合并候选子串的估计选择性。 候选子串的组合估计选择性作为给定字符串谓词的估计选择性返回。

    Relaxation-based approach to automatic physical database tuning
    25.
    发明申请
    Relaxation-based approach to automatic physical database tuning 审中-公开
    基于放松的自动物理数据库调优方法

    公开(公告)号:US20060242102A1

    公开(公告)日:2006-10-26

    申请号:US11111015

    申请日:2005-04-21

    CPC classification number: G06F16/22

    Abstract: A system that facilitates automatic selection of a physical configuration of a database comprises an optimizer component that determines simulated physical structures and creates a hypothetical configuration based thereon. A reduction component progressively reduces size of the configuration until the hypothetical configuration is associated with a size below a threshold. For example, the simulated physical structures can be based at least in part upon a workload.

    Abstract translation: 促进数据库的物理配置的自动选择的系统包括确定模拟物理结构并基于此创建假设配置的优化器组件。 缩减组件逐渐减小配置的大小,直到假设配置与小于阈值的大小相关联。 例如,模拟的物理结构可以至少部分地基于工作负载。

    Database tuning advisor
    26.
    发明申请
    Database tuning advisor 审中-公开
    数据库调优顾问

    公开(公告)号:US20060085484A1

    公开(公告)日:2006-04-20

    申请号:US10966563

    申请日:2004-10-15

    CPC classification number: G06F16/2282 G06F16/2272

    Abstract: An automated physical database design tool may provide an integrated physical design recommendation for horizontal partitioning, indexes and indexed views, all three features being tuned together (in concert). Manageability requirements may be specified when optimizing for performance. User-specified configuration may enable the specification of a partial physical design without materialization of the physical design. The tuning process may be performed for a production server but may be conducted substantially on a test server. Secondary indexes may be suggested for XML columns. Tuning of a database may be invoked by any owner of a database. Usage of objects may be evaluated and a recommendation for dropping unused objects may be issued. Reports may be provided concerning the count and percentage of queries in the workload that reference a particular database, and/or the count and percentage of queries in the workload that reference a particular table or column. A feature may be provided whereby a weight may be associated with each statement in the workload, enabling relative importance of particular statements to be specified. An in-row length for a column may be specified. If a value for the column exceeds the specified in-row length for that column, the portion of the value not exceeding the specified in-row length may be stored in the row while the portion of the value exceeding the specified in-row length may be stored in an overflow area. Rebuild and reorganization recommendations may be generated.

    Abstract translation: 自动化物理数据库设计工具可以为水平划分,索引和索引视图提供集成的物理设计建议,所有这三个特征被一起调谐(一致)。 在优化性能时可以指定可管理性要求。 用户指定的配置可以实现部分物理设计的规范,而不会实现物理设计。 可以对生产服务器执行调整过程,但是可以基本上在测试服务器上进行。 可以针对XML列建议辅助索引。 数据库的任何拥有者都可以调用数据库。 可以评估对象的使用,并且可以发出用于丢弃未使用对象的建议。 可以提供关于引用特定数据库的工作负载中的查询的计数和百分比的报告,和/或引用特定表或列的工作负载中的查询的计数和百分比。 可以提供特征,其中权重可以与工作负载中的每个语句相关联,使得能够指定特定语句的相对重要性。 可以指定列的行内长度。 如果列的值超过该列的指定行内长度,则不超过指定行内长度的部分可能存储在行中,而超出指定行内长度的值的部分可能会 存储在溢出区域。 可能会生成重建和重组建议。

    Sampling for queries
    27.
    发明申请
    Sampling for queries 有权
    查询抽样

    公开(公告)号:US20060085410A1

    公开(公告)日:2006-04-20

    申请号:US11296036

    申请日:2005-12-07

    Abstract: A method of estimating the Results of a database query are estimated by performing a sampling of weighted tuples in a database based on a probability of usage of tuples required in executing a workload. A probability is associated with each tuple sampled. And, can aggregate is computed over values in each sampled tuple while multiplying by the inverses of the probabilities associated with each tuple sampled.

    Abstract translation: 通过基于执行工作负载所需的元组的使用概率,对数据库中的加权元组进行抽样来估计估计数据库查询结果的方法。 每个元组采样的概率相关。 并且,可以在每个采样的元组中的值上计算可以聚合,同时乘以与每个元组采样相关联的概率的逆。

    Transformation tool for mapping XML to relational database
    28.
    发明申请
    Transformation tool for mapping XML to relational database 有权
    用于将XML映射到关系数据库的转换工具

    公开(公告)号:US20050203933A1

    公开(公告)日:2005-09-15

    申请号:US10796435

    申请日:2004-03-09

    CPC classification number: G06F17/30917 Y10S707/99943 Y10S707/99944

    Abstract: An XML transformation tool that constructs a relational database with associated physical structures that can be populated with shredded XML data. A mapping transformation enumerator examines queries in the workload and enumerates mapping transformations that use XSD specific constraints and statistics on XML data and can be used to generate mappings from XSD to relational database schema that may lead to better performance in presence of physical design. A design tuner that searches mappings generated from a default mapping using enumerated transformations together with physical design structures associated with those mappings and selects a preferred mapping and the physical design structures. Cost estimates for performing queries in the workload are determined for the relational database implementing the mapping and associated physical design structures.

    Abstract translation: 一个XML转换工具,用于构建具有关联物理结构的关系数据库,该关系数据库可以用切碎的XML数据填充。 映射转换枚举器检查工作负载中的查询,并枚举使用XSD特定约束和XML数据统计信息的映射转换,并可用于生成从XSD到关系数据库模式的映射,从而可以在存在物理设计时实现更好的性能。 一种设计调谐器,其使用枚举变换搜索从默认映射生成的映射以及与这些映射关联的物理设计结构,并选择优选映射和物理设计结构。 为执行映射和相关联的物理设计结构的关系数据库确定在工作负载中执行查询的成本估算。

    Merging materialized view pairs for database workload materialized view selection
    30.
    发明授权
    Merging materialized view pairs for database workload materialized view selection 有权
    为数据库工作负载物化视图选择合并物化视图对

    公开(公告)号:US06356890B1

    公开(公告)日:2002-03-12

    申请号:US09629353

    申请日:2000-08-01

    Abstract: An index and materialized view selection wizard produces a fast and reasonable recommendation for a configuration of indexes, materialized views, and indexes on materialized views which are beneficial given a specified workload for a given database and database server. Candidate materialized views and indexes are obtained, and a joint enumeration of the combined materialized views and indexes is performed to obtain a recommended configuration. The configuration includes indexes, materialized views and indexes on materialized views. Candidate materialized views are obtained by first determining subsets of tables are referenced in queries in the workload and then finding interesting table subsets. Next, interesting subsets are considered on a per query basis to determine which are syntactically relevant for a query. Materialized views which are likely to be used for the workload are then generated along with a set of merged materialized views. Clustered indexes and non-clustered indexes on materialized views are then generated. The indexes, materialized views and indexes on materialized views are then enumerated together to form the recommended configuration.

    Abstract translation: 索引和物化视图选择向导可以为物理视图的索引,物化视图和索引配置提供快速合理的建议,这对给定数据库和数据库服务器的指定工作负载是有益的。 获取候选物化视图和索引,并执行组合实例化视图和索引的联合枚举,以获得推荐的配置。 配置包括物化视图的索引,物化视图和索引。 通过首先确定表中的子集在工作负载中的查询中引用并且然后找到有趣的表子集来获得候选物化视图。 接下来,在每个查询的基础上考虑有趣的子集,以确定哪个在查询语法上相关。 可能用于工作负载的物化视图随同一组合并物化视图一起生成。 然后生成物化视图上的聚簇索引和非聚集索引。 然后将物化视图的索引,物化视图和索引列在一起以形成推荐的配置。

Patent Agency Ranking