Data transfers in columnar data systems

    公开(公告)号:US09813502B1

    公开(公告)日:2017-11-07

    申请号:US15584320

    申请日:2017-05-02

    摘要: A computer-implemented method includes receiving a request to transmit column group data to a target node, the column group data comprising C columns within a column-oriented data table and determining a transmission row count R for transmitting the column group data to the target node. The method may also include transmitting a transmission packet comprising R sequentially-ordered data elements for each of the C columns to the target node. The R data elements for each column may be sequentially retrieved from memory. A corresponding method includes receiving, at a target node, a request to receive the column group data, determining the transmission row count R for receiving the column group data, and receiving a transmission packet comprising R sequentially-ordered data elements for each of the C columns and storing the sequentially-ordered data elements within memory. A corresponding computer system and corresponding computer program products are also disclosed herein.

    Updating of in-memory synopsis metadata for inserts in database table
    2.
    发明授权
    Updating of in-memory synopsis metadata for inserts in database table 有权
    更新在数据库表中插入的内存中概要元数据

    公开(公告)号:US09519676B1

    公开(公告)日:2016-12-13

    申请号:US14967346

    申请日:2015-12-13

    IPC分类号: G06F17/30

    摘要: In updating a synopsis table of a database system, a database management unit performs a transaction to insert row(s) in a section of the base table and determines whether a synopsis entry for the section is stored in the memory. If stored in the memory, the in-memory synopsis entry is retrieved and metadata values in the in-memory synopsis entry are updated with data from the row(s) to be inserted. If not stored in the memory, the in-memory synopsis entry is generated and the metadata values in the in-memory synopsis entry are updated with data from the row(s). The insert transaction is then committed. Synopsis entry on-disk updates are thus avoided, significantly reducing the cost of updating the synopsis entries from the insert transaction. This yields enhanced performance especially for inserts of a small number of rows, while the benefits of synopsis entries are still available.

    摘要翻译: 在更新数据库系统的概要表时,数据库管理单元执行事务以在基表的一部分中插入行,并且确定该段的概要条目是否存储在存储器中。 如果存储在存储器中,则检索内存中概要条目,并使用要插入的行的数据更新内存中概要条目中的元数据值。 如果不存储在内存中,则会生成内存中概要条目,并使用该行中的数据更新内存中概要条目中的元数据值。 然后提交插入事务。 因此避免了概要录入磁盘更新,从而显着降低了从插入事务更新概要条目的成本。 这样会提高性能,特别是对于少量行的插入,而概要条目的好处仍然可用。

    Data Encoding and Processing Columnar Data
    4.
    发明申请
    Data Encoding and Processing Columnar Data 审中-公开
    数据编码和处理柱状数据

    公开(公告)号:US20160070730A1

    公开(公告)日:2016-03-10

    申请号:US14945502

    申请日:2015-11-19

    IPC分类号: G06F17/30

    摘要: The embodiments described herein relate to accessing a plurality of data elements. A page of column data is compressed and stored in a format that includes a collection of data elements. A tuple map is stored, and the collection of data elements is indexed via the tuple map. A query is processed based on the compressed page by identifying a set of tuple identifiers mapping to stored data in support of the query. Each tuple identifier corresponds to a location of a respective tuple of the compressed page.

    摘要翻译: 这里描述的实施例涉及访问多个数据元素。 一列列数据被压缩并以包含数据元素集合的格式存储。 存储元组映射,并通过元组映射索引数据元素的集合。 通过识别映射到支持查询的存储数据的一组元组标识符,基于压缩页面来处理查询。 每个元组标识符对应于压缩页面的相应元组的位置。

    Dynamically determining join order
    5.
    发明授权
    Dynamically determining join order 有权
    动态确定连接顺序

    公开(公告)号:US09171043B2

    公开(公告)日:2015-10-27

    申请号:US13755784

    申请日:2013-01-31

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30466 G06F17/30498

    摘要: A weight is determined for each of a plurality of join predicates for a join between one or more first database objects and one or more second database objects based on a join selectivity for each of the plurality of join predicates. The plurality of join predicates are sorted based on the determined weights. The join operation is performed joining the one or more first database objects with the one or more second database objects in accordance with an order of the sorted plurality of join predicates.

    摘要翻译: 基于对于多个连接谓词中的每一个的连接选择性,为一个或多个第一数据库对象与一个或多个第二数据库对象之间的连接的多个连接谓词中的每一个确定权重。 基于确定的权重对多个连接谓词进行排序。 根据排序的多个连接谓词的顺序,执行将一个或多个第一数据库对象与一个或多个第二数据库对象连接的连接操作。

    PARTITIONING DATA FOR PARALLEL PROCESSING
    6.
    发明申请
    PARTITIONING DATA FOR PARALLEL PROCESSING 有权
    用于并行处理的分割数据

    公开(公告)号:US20150032780A1

    公开(公告)日:2015-01-29

    申请号:US14483841

    申请日:2014-09-11

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30339 G06F17/30584

    摘要: According to one embodiment of the present invention, a system partitions data for parallel processing and comprises one or more computer systems with at least one processor. The system partitions data of a data object into a plurality of data partitions within a data structure based on a plurality of keys. The data structure includes a plurality of dimensions and each key is associated with a corresponding different dimension of the data structure. Portions of the data structure representing different data partitions are assigned to the computer systems for parallel processing, and the assigned data structure portions are processed in parallel to perform an operation. Embodiments of the present invention further include a method and computer program product for partitioning data for parallel processing in substantially the same manner described above.

    摘要翻译: 根据本发明的一个实施例,系统对用于并行处理的数据进行分区,并且包括具有至少一个处理器的一个或多个计算机系统。 该系统基于多个密钥将数据对象的数据分割成数据结构内的多个数据分区。 数据结构包括多个维度,并且每个密钥与数据结构的对应的不同维度相关联。 代表不同数据分区的数据结构的一部分被分配给用于并行处理的计算机系统,并且分配的数据结构部分被并行处理以执行操作。 本发明的实施例还包括用于以与上述基本相同的方式分割用于并行处理的数据的方法和计算机程序产品。

    Data Encoding and Processing Columnar Data
    7.
    发明申请
    Data Encoding and Processing Columnar Data 有权
    数据编码和处理柱状数据

    公开(公告)号:US20140372389A1

    公开(公告)日:2014-12-18

    申请号:US13918832

    申请日:2013-06-14

    IPC分类号: G06F17/30

    摘要: Aspects of the invention are provided for accessing a plurality of data elements. A page of column data is stored in a format that includes compressed and/or non-compressed elements, with the format including a plurality of arrays and a vector. Each of the arrays stores elements with common characteristics, with the vector functioning as a mapping to the stored data elements. The vector is leveraged to identify an array and determine an offset to support access to one or more of the data elements.

    摘要翻译: 本发明的各方面被提供用于访问多个数据元素。 一列列数据以包括压缩和/或非压缩元素的格式存储,格式包括多个数组和向量。 每个阵列都存储具有共同特征的元素,其中向量用作映射到存储的数据元素。 该向量用于识别阵列并确定偏移量以支持对一个或多个数据元素的访问。

    Technology for join processing
    8.
    发明授权

    公开(公告)号:US10810201B2

    公开(公告)日:2020-10-20

    申请号:US14965737

    申请日:2015-12-10

    摘要: Performing a join of first and second database tables for a query includes applying a predicate of the query to the first table as a first predicate and determining how many distinct join key values the first table has that survive the applying of the first predicate, wherein a join key value of the first table that survives the applying of the first predicate is a surviving join key value for a second predicate. A selection includes selecting among applying the second predicate to the second table, probing the second table with the second predicate, and neither applying the second predicate to the second table nor probing the second table with the second predicate, wherein the selecting is responsive to the number of distinct, surviving join key values.