Scalable index build techniques for column stores

    公开(公告)号:US10216777B2

    公开(公告)日:2019-02-26

    申请号:US15407110

    申请日:2017-01-16

    IPC分类号: G06F17/30 G06F9/50 H04L12/911

    摘要: Architecture that includes an index creation algorithm that utilizes available resources and dynamically adjusts to successfully scale with increased resources and be able to do so for any data distribution. The resources can be processing resources, memory, and/or input/output, for example. A finer level of granularity, called a segment, is utilized to process tuples in a partition while creating an index. The segment also aligns with compression techniques for the index. By choosing an appropriate size for a segment and using load balancing the overall time for index creation can be reduced. Each segment can then be processed by a single thread thereby limiting segment skew. Skew is further limited by breaking down the work done by a thread into parallelizable stages.

    NORMALIZING DATA FOR FAST SUPERSCALAR PROCESSING
    4.
    发明申请
    NORMALIZING DATA FOR FAST SUPERSCALAR PROCESSING 有权
    正规化数据进行快速超级处理

    公开(公告)号:US20150234778A1

    公开(公告)日:2015-08-20

    申请号:US14702749

    申请日:2015-05-03

    IPC分类号: G06F15/82 G06F17/30

    摘要: A data normalization system is described herein that represents multiple data types that are common within database systems in a normalized form that can be processed uniformly to achieve faster processing of data on superscalar CPU architectures. The data normalization system includes changes to internal data representations of a database system as well as functional processing changes that leverage normalized internal data representations for a high density of independently executable CPU instructions. Because most data in a database is small, a majority of data can be represented by the normalized format. Thus, the data normalization system allows for fast superscalar processing in a database system in a variety of common cases, while maintaining compatibility with existing data sets.

    摘要翻译: 这里描述了一种数据归一化系统,其表示以规范化形式在数据库系统中通用的多个数据类型,其可以被均匀地处理以实现对超标量CPU架构的数据的更快处理。 数据归一化系统包括对数据库系统的内部数据表示的更改以及利用高密度独立可执行CPU指令的规范化内部数据表示的功能处理变化。 因为数据库中的大多数数据很小,所以大部分数据可以用归一化格式表示。 因此,数据归一化系统允许在各种常见情况下在数据库系统中进行快速超标量处理,同时保持与现有数据集的兼容性。

    Scalable index build techniques for column stores

    公开(公告)号:US10860556B2

    公开(公告)日:2020-12-08

    申请号:US16244825

    申请日:2019-01-10

    摘要: Architecture that includes an index creation algorithm that utilizes available resources and dynamically adjusts to successfully scale with increased resources and be able to do so for any data distribution. The resources can be processing resources, memory, and/or input/output, for example. A finer level of granularity, called a segment, is utilized to process tuples in a partition while creating an index. The segment also aligns with compression techniques for the index. By choosing an appropriate size for a segment and using load balancing the overall time for index creation can be reduced. Each segment can then be processed by a single thread thereby limiting segment skew. Skew is further limited by breaking down the work done by a thread into parallelizable stages.

    Scalable index build techniques for column stores
    8.
    发明授权
    Scalable index build techniques for column stores 有权
    列存储的可扩展索引构建技术

    公开(公告)号:US09547677B2

    公开(公告)日:2017-01-17

    申请号:US14662108

    申请日:2015-03-18

    IPC分类号: G06F17/30 G06F9/50 H04L12/911

    摘要: Architecture that includes an index creation algorithm that utilizes available resources and dynamically adjusts to successfully scale with increased resources and be able to do so for any data distribution. The resources can be processing resources, memory, and/or input/output, for example. A finer level of granularity, called a segment, is utilized to process tuples in a partition while creating an index. The segment also aligns with compression techniques for the index. By choosing an appropriate size for a segment and using load balancing the overall time for index creation can be reduced. Each segment can then be processed by a single thread thereby limiting segment skew. Skew is further limited by breaking down the work done by a thread into parallelizable stages.

    摘要翻译: 架构包括一个利用可用资源并动态调整的索引创建算法,以便成功地扩展资源,并能够为任何数据分配做到这一点。 资源可以是例如处理资源,存储器和/或输入/输出。 使用更精细的粒度级别(称为段)来处理分区中的元组,同时创建索引。 该段也与索引的压缩技术保持一致。 通过为段选择适当的大小并使用负载平衡,可以减少创建索引的总时间。 然后可以通过单个线程处理每个段,从而限制段偏移。 通过将线程完成的工作分解成可并行化阶段来进一步限制倾斜。