Approximating a database statistic
    1.
    发明申请
    Approximating a database statistic 有权
    近似数据库统计

    公开(公告)号:US20080120274A1

    公开(公告)日:2008-05-22

    申请号:US11796102

    申请日:2007-04-25

    IPC分类号: G06F7/00

    摘要: A method and apparatus for approximating a database statistic, such as the number of distinct values (NDV) is provided. To approximate the NDV for a portion of a table, a synopsis of distinct values is constructed. Each value in the portion is mapped to a domain of values. The mapping function is implemented with a uniform hash function, in one embodiment. If the resultant domain value does not exist in the synopsis, the domain value is added to the synopsis. If the synopsis reaches its capacity, a portion of the domain values are discarded from the synopsis. The statistic is approximated based on the number (N) of domain values in the synopsis and the portion of the domain that is represented in the synopsis relative to the size of the domain.

    摘要翻译: 提供了用于近似数据库统计量的方法和装置,例如不同值(NDV)的数量。 为了近似表的一部分的NDV,构建了不同值的概要。 该部分中的每个值都映射到值的域。 在一个实施例中,映射功能是用均匀散列函数实现的。 如果在概要中不存在结果域值,则将域值添加到概要中。 如果概要达到其容量,则域值的一部分将从摘要中被丢弃。 统计量基于概要中的域值的数量(N)和在概要中相对于域的大小表示的域的部分近似。

    Merging synopses to determine number of distinct values in large databases
    2.
    发明授权
    Merging synopses to determine number of distinct values in large databases 有权
    合并摘要以确定大型数据库中不同值的数量

    公开(公告)号:US07603339B2

    公开(公告)日:2009-10-13

    申请号:US11796110

    申请日:2007-04-25

    IPC分类号: G06F7/00 G06F17/30 G06F17/00

    摘要: A method and apparatus for merging synopses to determine a database statistic, e.g., a number of distinct values (NDV), is disclosed. The merging can be used to determine an initial database statistic or to perform incremental statistics maintenance. For example, each synopsis can pertain to a different partition, such that merging the synopses generates a global statistic. When performing incremental maintenance, only those synopses whose partitions have changed need to be updated. Each synopsis contains domain values that summarize the statistic. However, the synopses may initially contain domain values that are not compatible with each other. Prior to merging the synopses the domain values in each synopsis is made compatible with the domain values in the other synopses. The adjustment is made such that each synopsis represents the same range of domain values, in one embodiment. After “compatible synopses” are formed, the synopses are merged by taking the union of the compatible synopses.

    摘要翻译: 公开了用于合并概要以确定数据库统计量的方法和装置,例如多个不同值(NDV)。 合并可用于确定初始数据库统计信息或执行增量统计维护。 例如,每个概要可以涉及不同的分区,以便合并概要会生成全局统计量。 执行增量维护时,只需要更新其分区已更改的概要文件。 每个概要包含总结统计量的域值。 但是,这些概要可能最初包含彼此不兼容的域值。 在合并概要之前,每个概要中的域值与其他概要中的域值兼容。 在一个实施例中进行调整,使得每个概要表示相同范围的域值。 在形成“兼容简介”之后,通过兼容兼容简报的合并来合并概要。

    Approximating a database statistic
    3.
    发明授权
    Approximating a database statistic 有权
    近似数据库统计

    公开(公告)号:US07636731B2

    公开(公告)日:2009-12-22

    申请号:US11796102

    申请日:2007-04-25

    IPC分类号: G06F7/00 G06F17/30 G06F17/00

    摘要: A method and apparatus for approximating a database statistic, such as the number of distinct values (NDV) is provided. To approximate the NDV for a portion of a table, a synopsis of distinct values is constructed. Each value in the portion is mapped to a domain of values. The mapping function is implemented with a uniform hash function, in one embodiment. If the resultant domain value does not exist in the synopsis, the domain value is added to the synopsis. If the synopsis reaches its capacity, a portion of the domain values are discarded from the synopsis. The statistic is approximated based on the number (N) of domain values in the synopsis and the portion of the domain that is represented in the synopsis relative to the size of the domain.

    摘要翻译: 提供了用于近似数据库统计量的方法和装置,例如不同值(NDV)的数量。 为了近似表的一部分的NDV,构建了不同值的概要。 该部分中的每个值都映射到值的域。 在一个实施例中,映射功能是用均匀散列函数实现的。 如果在概要中不存在结果域值,则将域值添加到概要中。 如果概要达到其容量,则域值的一部分将从摘要中被丢弃。 统计量基于概要中的域值的数量(N)和在概要中相对于域的大小表示的域的部分近似。

    Merging synopses to determine number of distinct values in large databases
    4.
    发明申请
    Merging synopses to determine number of distinct values in large databases 有权
    合并摘要以确定大型数据库中不同值的数量

    公开(公告)号:US20080120275A1

    公开(公告)日:2008-05-22

    申请号:US11796110

    申请日:2007-04-25

    IPC分类号: G06F17/30

    摘要: A method and apparatus for merging synopses to determine a database statistic, e.g., a number of distinct values (NDV), is disclosed. The merging can be used to determine an initial database statistic or to perform incremental statistics maintenance. For example, each synopsis can pertain to a different partition, such that merging the synopses generates a global statistic. When performing incremental maintenance, only those synopses whose partitions have changed need to be updated. Each synopsis contains domain values that summarize the statistic. However, the synopses may initially contain domain values that are not compatible with each other. Prior to merging the synopses the domain values in each synopsis is made compatible with the domain values in the other synopses. The adjustment is made such that each synopsis represents the same range of domain values, in one embodiment. After “compatible synopses” are formed, the synopses are merged by taking the union of the compatible synopses.

    摘要翻译: 公开了用于合并概要以确定数据库统计量的方法和装置,例如多个不同值(NDV)。 合并可用于确定初始数据库统计信息或执行增量统计维护。 例如,每个概要可以涉及不同的分区,以便合并概要会生成全局统计量。 执行增量维护时,只需要更新其分区已更改的概要文件。 每个概要包含总结统计量的域值。 但是,这些概要可能最初包含彼此不兼容的域值。 在合并概要之前,每个概要中的域值与其他概要中的域值兼容。 在一个实施例中进行调整,使得每个概要表示相同范围的域值。 在形成“兼容简介”之后,通过兼容兼容简报的合并来合并概要。

    Parallel partition-wise aggregation
    5.
    发明授权
    Parallel partition-wise aggregation 有权
    并行分区聚合

    公开(公告)号:US07779008B2

    公开(公告)日:2010-08-17

    申请号:US11060260

    申请日:2005-02-16

    IPC分类号: G06F17/30

    CPC分类号: G06F9/4494

    摘要: Techniques are provided for performing a parallel aggregation operation on data that resides in a container, such as a relational table. During generation of the execution plan for the operation, it is determined whether partition-wise aggregation should be performed, based on the grouping keys involved in the aggregation and the partition keys used to partition the container. If partition-wise aggregation is to be performed, then the assignments given to the slave processes that are assigned to scan a container are made on a partition-wise basis. The scan slaves themselves may perform full or partial aggregation (depending on whether they are the only scan slaves assigned to the partition). If the scan slaves perform no aggregation, or only partial aggregation, then the scan slaves redistribute the data items to aggregation slaves that are local to the scan slaves.

    摘要翻译: 提供了用于对驻留在诸如关系表的容器中的数据执行并行聚合操作的技术。 在生成操作的执行计划时,基于聚合中涉及的分组密钥和用于分区容器的分区密钥,确定是否应该执行分区聚合。 如果要执行分区聚合,则分配给分配给扫描容器的从属进程的分配是以分区方式进行的。 扫描从站本身可以执行完全或部分聚合(取决于它们是否是分配给分区的唯一扫描从站)。 如果扫描从站不执行聚合或仅执行部分聚合,则扫描从站将数据项重新分配到扫描从站本地的聚合从站。

    Parallel partition-wise aggregation
    6.
    发明申请
    Parallel partition-wise aggregation 有权
    并行分区聚合

    公开(公告)号:US20060182046A1

    公开(公告)日:2006-08-17

    申请号:US11060260

    申请日:2005-02-16

    IPC分类号: H04L12/16 H04Q11/00

    CPC分类号: G06F9/4494

    摘要: Techniques are provided for performing a parallel aggregation operation on data that resides in a container, such as a relational table. During generation of the execution plan for the operation, it is determined whether partition-wise aggregation should be performed, based on the grouping keys involved in the aggregation and the partition keys used to partition the container. If partition-wise aggregation is to be performed, then the assignments given to the slave processes that are assigned to scan a container are made on a partition-wise basis. The scan slaves themselves may perform full or partial aggregation (depending on whether they are the only scan slaves assigned to the partition). If the scan slaves perform no aggregation, or only partial aggregation, then the scan slaves redistribute the data items to aggregation slaves that are local to the scan slaves.

    摘要翻译: 提供了用于对驻留在诸如关系表的容器中的数据执行并行聚合操作的技术。 在生成操作的执行计划时,基于聚合中涉及的分组密钥和用于分区容器的分区密钥,确定是否应该执行分区聚合。 如果要执行分区聚合,则分配给分配给扫描容器的从属进程的分配是以分区方式进行的。 扫描从站本身可以执行完全或部分聚合(取决于它们是否是分配给分区的唯一扫描从站)。 如果扫描从站不执行聚合或仅执行部分聚合,则扫描从站将数据项重新分配到扫描从站本地的聚合从站。

    Techniques for pruning a data object during operations that join multiple data objects
    7.
    发明授权
    Techniques for pruning a data object during operations that join multiple data objects 有权
    在连接多个数据对象的操作期间修剪数据对象的技术

    公开(公告)号:US07020661B1

    公开(公告)日:2006-03-28

    申请号:US10193620

    申请日:2002-07-10

    IPC分类号: G06F17/30

    摘要: Techniques for eliminating one or more portions of a data object from any join step of an operation that joins multiple data objects include determining that an operation joins a first data object and a second data object. The second data object includes multiple portions. Each of multiple data units of the first data object is scanned. Based on data in the data units of the first data object, information is generated. The information indicates a portion of the second data object for exclusion. The indicated portion is excluded from an output of the operation. Only one or more portions of the second data object that are not indicated for exclusion in the information are included in a particular join step involving the second data object. By pruning a large second table, such as a fact table, the computational resources consumed by the joins are substantially reduced.

    摘要翻译: 用于从连接多个数据对象的操作的任何连接步骤中消除数据对象的一个​​或多个部分的技术包括确定操作连接第一数据对象和第二数据对象。 第二数据对象包括多个部分。 扫描第一数据对象的多个数据单元中的每一个。 基于第一数据对象的数据单元中的数据,生成信息。 该信息指示用于排除的第二数据对象的一部分。 指示的部分从操作的输出中排除。 仅在信息中未被指示排除的第二数据对象的一个​​或多个部分被包括在涉及第二数据对象的特定连接步骤中。 通过修剪诸如事实表的大的第二表,大大减少了连接消耗的计算资源。

    Join factorization of union/union all queries
    8.
    发明申请
    Join factorization of union/union all queries 有权
    加入联盟/联盟所有查询的因式分解

    公开(公告)号:US20070219969A1

    公开(公告)日:2007-09-20

    申请号:US11716010

    申请日:2007-03-08

    IPC分类号: G06F17/30

    摘要: Under a type of query transformation referred to herein as join factorization, the branches of an UNION/UNION ALL query that join a common table are combined to reduce accesses to the common table. The transformation can be expressed as (T1 join T2) union all (T1 join T3)=T1 join (T2 union all T3), where T1, T2 and T3 are three tables. A given query may be rewritten in many alternate ways using join factorization. Evaluating each alternative can be expensive. Therefore, the alternatives are generated and evaluated in a way that minimizes the cost of evaluating the alternatives.

    摘要翻译: 在这里称为连接因式分解的一种类型的查询变换中,加入公共表的UNION / UNION ALL查询的分支被组合以减少对公共表的访问。 转换可以表示为(T1连接T2)联合全部(T1连接T3)= T1连接(T2联合全T3),其中T1,T2和T3是三个表。 给定的查询可以使用连接因式分解以许多替代方式重写。 评估每个替代品可能是昂贵的。 因此,以最小化评估替代品的成本的方式生成和评估替代方案。

    Join factorization of union/union all queries
    9.
    发明授权
    Join factorization of union/union all queries 有权
    加入联盟/联盟所有查询的因式分解

    公开(公告)号:US07644062B2

    公开(公告)日:2010-01-05

    申请号:US11716010

    申请日:2007-03-08

    IPC分类号: G06F7/00 G06F17/30

    摘要: Under a type of query transformation referred to herein as join factorization, the branches of an UNION/UNION ALL query that join a common table are combined to reduce accesses to the common table. The transformation can be expressed as (T1 join T2) union all (T1 join T3)=T1 join (T2 union all T3), where T1, T2 and T3 are three tables. A given query may be rewritten in many alternate ways using join factorization. Evaluating each alternative can be expensive. Therefore, the alternatives are generated and evaluated in a way that minimizes the cost of evaluating the alternatives.

    摘要翻译: 在这里称为连接因式分解的一种类型的查询变换中,加入公共表的UNION / UNION ALL查询的分支被组合以减少对公共表的访问。 转换可以表示为(T1连接T2)联合全部(T1连接T3)= T1连接(T2联合全T3),其中T1,T2和T3是三个表。 给定的查询可以使用连接因式分解以许多替代方式重写。 评估每个替代品可能是昂贵的。 因此,以最小化评估替代品的成本的方式生成和评估替代方案。

    COMPUTING SELECTIVITIES FOR GROUP OF COLUMNS AND EXPRESSIONS
    10.
    发明申请
    COMPUTING SELECTIVITIES FOR GROUP OF COLUMNS AND EXPRESSIONS 审中-公开
    用于组和表达组的计算选择

    公开(公告)号:US20100030728A1

    公开(公告)日:2010-02-04

    申请号:US12181994

    申请日:2008-07-29

    IPC分类号: G06F17/30

    CPC分类号: G06F16/24545

    摘要: Techniques are described herein for estimating selectivities of query predicates that reference more than one column and predicates that include column expressions. Virtual columns are defined based on column groups and column expressions. Statistics are gathered on the virtual columns and are used to estimate the selectivities of query predicates that include column groups or expressions. Query predicates that include column groups are mapped to virtual columns on which statistics are gathered, based on similarities between the column groups in the query predicates and the column groups on which the virtual columns are defined. Virtual columns are defined for column groups and expressions are specified by users or a workload analyzer that analyzes query predicates.

    摘要翻译: 这里描述了用于估计引用多于一个列的查询谓词的选择性以及包括列表达式的谓词的技术。 虚拟列是基于列组和列表达式定义的。 在虚拟列上收集统计信息,并用于估计包含列组或表达式的查询谓词的选择性。 根据查询谓词中的列组与定义了虚拟列的列组之间的相似性,将列组映射到收集统计信息的虚拟列的查询谓词。 为列组定义虚拟列,表达式由用户或分析查询谓词的工作负载分析器指定。