Inverse distribution function operations in a parallel relational database
    1.
    发明授权
    Inverse distribution function operations in a parallel relational database 有权
    并行关系数据库中的反向分布函数操作

    公开(公告)号:US08880481B1

    公开(公告)日:2014-11-04

    申请号:US13434442

    申请日:2012-03-29

    IPC分类号: G06F17/30

    摘要: Inverse distribution operations are performed on a large distributed parallel database comprising a plurality of distributed data segments to determine a data value at a predetermined percentile of a sorted dataset formed on one segment. Data elements from across the segments may be first grouped, either by partitioning keys or by hashing, the groups are sorted into a predetermined order, and data values corresponding to the desired percentile are picked up at a row location of the corresponding data element of each group. For a global dataset that is spread across the database segments, a local sort of data elements is performed on each segment, and the data elements from the local sorts are streamed in overall sorted order to one segment to form the sorted dataset.

    摘要翻译: 在包括多个分布式数据段的大型分布式并行数据库上执行反向分发操作,以确定在一个段上形成的排序数据集的预定百分位数处的数据值。 来自跨段的数据元素可以首先通过分区键或通过散列进行分组,将这些组排列成预定顺序,并且在每个的对应数据元素的行位置处拾取与所需百分位数对应的数据值 组。 对于分布在数据库段中的全局数据集,对每个段执行本地数据元素,并将来自本地排序的数据元素按照整体排序顺序流式传输到一个段以形成排序的数据集。

    Query plan management in shared distributed data stores
    2.
    发明授权
    Query plan management in shared distributed data stores 有权
    共享分布式数据存储中的查询计划管理

    公开(公告)号:US09002824B1

    公开(公告)日:2015-04-07

    申请号:US13529501

    申请日:2012-06-21

    IPC分类号: G06F7/00 G06F17/00 G06F17/30

    摘要: The invention identifies and caches query plans in a shared-nothing distributed data store that are unlikely to become invalid because they do not reference objects that are likely to be changed or deleted. Plans that are likely to become invalid and are not cached are those plans that reference data that is partitioned across segment/query execution nodes of the data store, plans that are complex, and plans that reference objects that are not “built-in” (primitive) objects. The effect is that most plans which are generated on a query dispatch (master) node are not cached, whereas most plans generated on an execution (segment) node are cached.

    摘要翻译: 本发明在无共享的分布式数据存储中识别和缓存查询计划,因为它们不引用可能被更改或删除的对象,因此不太可能变得无效。 可能无效并且未被缓存的计划是那些引用在数据存储区段/查询执行节点之间分区的数据的计划,复杂的计划以及引用不是“内置”的对象的计划( 原始)对象。 效果是,在查询分派(主)节点上生成的大多数计划不会被缓存,而在执行(段)节点上生成的大多数计划都被缓存。

    Systematic verification of database metadata upgrade
    3.
    发明授权
    Systematic verification of database metadata upgrade 有权
    数据库元数据升级的系统验证

    公开(公告)号:US08738569B1

    公开(公告)日:2014-05-27

    申请号:US13371337

    申请日:2012-02-10

    IPC分类号: G06F17/00

    CPC分类号: G06F17/303

    摘要: A script is run on a database to transform the metadata and produce an upgraded database. A new database corresponding to the upgraded database is initialized, and the metadata in the new database catalog is verified by comparing it to the upgraded database metadata. A fast verification is performed on a partial upgrade by dumping the catalogs of master nodes and comparing the results, and a thorough verification is performed on a full upgrade by querying and comparing both master node catalogs and segment node catalogs.

    摘要翻译: 在数据库上运行脚本来转换元数据并生成升级的数据库。 初始化与升级的数据库对应的新数据库,并通过将其与升级的数据库元数据进行比较来验证新数据库目录中的元数据。 通过转储主节点的目录并比较结果,对部分升级进行快速验证,并通过查询和比较主节点目录和分段节点目录,对完全升级执行彻底的验证。

    Method and apparatus for acid validation within a distributed relational database under controlled concurrent workloads
    6.
    发明授权
    Method and apparatus for acid validation within a distributed relational database under controlled concurrent workloads 有权
    在受控并发工作负载下,在分布式关系数据库中进行酸性验证的方法和装置

    公开(公告)号:US09081772B1

    公开(公告)日:2015-07-14

    申请号:US13457153

    申请日:2012-04-26

    申请人: Caleb E. Welton

    发明人: Caleb E. Welton

    IPC分类号: G06F17/30

    摘要: Testing a database is disclosed. A test description defining a test of a database is received. The test description specifies a relative timing between an issuance of a first command via a first session of the database when the test is conducted and an issuance of a second command via a second session of the database when the test is conducted. The database is configured to conduct the test defined in the test description.

    摘要翻译: 披露了测试数据库。 接收定义数据库测试的测试描述。 测试描述指定当进行测试时通过数据库的第一会话发出第一命令之间的相对定时,以及当进行测试时通过数据库的第二会话发出第二命令。 数据库被配置为执行测试描述中定义的测试。

    Custom user parallel data import and export
    7.
    发明授权
    Custom user parallel data import and export 有权
    自定义用户并行数据导入和导出

    公开(公告)号:US08838634B1

    公开(公告)日:2014-09-16

    申请号:US13436419

    申请日:2012-03-30

    IPC分类号: G06F7/00

    CPC分类号: G06F17/30569

    摘要: Formatting data is disclosed. An indication of a specified data to be formatted between a format of a database and a format external to database is received. A formatter of the database is used to format the specified data between the format of the database and the format external to the database. The formatter has been integrated with the database using a formatter code defined external to the database.

    摘要翻译: 公开了格式化数据。 接收到要在数据库的格式和数据库外部格式之间进行格式化的指定数据的指示。 数据库的格式化器用于在数据库的格式和数据库外部的格式之间格式化指定的数据。 格式化程序已经使用在数据库外部定义的格式化程序代码与数据库集成。

    Methods and apparatus for aggregating sparse data

    公开(公告)号:US06606621B2

    公开(公告)日:2003-08-12

    申请号:US09870271

    申请日:2001-05-30

    IPC分类号: G06F1730

    摘要: A method for aggregating sparse data in a multidimensional array by using a composite join hierarchy created by segmenting the data so that each segment of the hierarchy processed is smaller and more likely to fit in memory. The method employs a recursive sub-cubing mechanism wherein an n-dimensional cube is broken into a number of (n−1)-dimensional cubes and each of those cubes are solved as (n−2)-dimensional cubes etc. Within each division, the processing is segmented by hierarchy level so a dimension with three hierarchy levels (for example, month-quarter-year) would form three separate subcubes with one less dimension. This algorithm produces one ‘worklist’ for every combination of hierarchy levels in the cube. Each of these worklists is represented as a bitmap of the cells contained within it and may be used as a basis of generating more aggregate worklists. To minimize the need for input-output data transfers, all the derived worklists of a single worklist are generated at the same time. This is accomplished without keeping more than n-worklists active at any given time to reduce the number of input-output data transfers needed without requiring substantially larger memory space.

    Mirrored database upgrade using state machine
    9.
    发明授权
    Mirrored database upgrade using state machine 有权
    使用状态机镜像数据库升级

    公开(公告)号:US08745445B1

    公开(公告)日:2014-06-03

    申请号:US13371342

    申请日:2012-02-10

    IPC分类号: G06F11/00

    摘要: A process for upgrading a mirrored shared-nothing database system comprises a sequence of short well-defined idempotent steps, and at least one non-idempotent step involving transforming a master catalog. The upgrade process is managed and controlled by a state machine that has a persistent memory running on the master node. In the event of a failure or crash during an idempotent step, the process stops the database in the current state and repeats the step. If a failure or crash occurs during a non-idempotent step, the upgrade process is rolled back to the beginning and repeated.

    摘要翻译: 用于升级镜像的无共享数据库系统的过程包括简短明确的幂等级步骤的序列,以及涉及变换主目录的至少一个非幂等级步骤。 升级过程由在主节点上运行持久内存的状态机进行管理和控制。 在幂等级步骤中出现故障或崩溃的情况下,该进程将在当前状态下停止数据库并重复该步骤。 如果在非幂等级步骤中发生故障或崩溃,则升级过程将回滚到最初并重复。

    Multidimensional database storage and retrieval system
    10.
    发明授权
    Multidimensional database storage and retrieval system 有权
    多维数据库存储和检索系统

    公开(公告)号:US07165065B1

    公开(公告)日:2007-01-16

    申请号:US09616643

    申请日:2000-07-14

    IPC分类号: G06F17/30

    摘要: In a multidimensional database, an aggregation operation is performed in an optimal manner by storing the values included in the/aggregation operation on the same disk page. A sparsity manager determines aggregate values that are computed from other data values during the aggregation operation. Each aggregate value is associated with one or more data values that are used during the aggregation operation to compute the aggregate value. The sparsity manager stores the associated data values in proximity to each other, such as on the same disk page, so that multiple disk page fetches may not be required for the same set of data values during the aggregation operation. The data values used in the aggregation operation can therefore be fetched once from a disk page, and thereafter are found in memory, such as on a cache page corresponding to the disk page. In this manner, multiple fetches for the same disk page during the aggregation operation are avoided.

    摘要翻译: 在多维数据库中,通过将包含在/聚合操作中的值存储在同一磁盘页面上,以最佳方式执行聚合操作。 稀疏管理器确定在聚合操作期间从其他数据值计算的聚合值。 每个聚合值与在聚合操作期间使用的一个或多个数据值相关联,以计算聚合值。 稀疏管理器将相关联的数据值彼此接近存储,例如在相同的磁盘页面上,使得在聚合操作期间,对于同一组数据值可能不需要多个磁盘页面提取。 因此,可以从磁盘页面获取聚合操作中使用的数据值,然后在内存中找到数据值,例如在与磁盘页面相对应的缓存页面上。 以这种方式,在聚合操作期间避免了用于同一磁盘页的多个提取。