Joining two data tables on a join attribute

    公开(公告)号:US11163769B2

    公开(公告)日:2021-11-02

    申请号:US16443958

    申请日:2019-06-18

    IPC分类号: G06F16/00 G06F16/2453

    摘要: A computer-implemented method for joining two data tables on a join attribute, where the data tables have at least a first and a second attribute and the second attribute is the join attribute. The method provides a function for associating a computing node to a given record. The function may be used to determine the associated computing node. The records of the two data tables may be distributed to the respective determined computing nodes. The relationship between the values of the first and second attributes may be modelled using a predefined dataset. For each record of the two data tables the values of the first attribute may be re-determined using the corresponding values of the second attribute. The function may be used to re-determine the associated computing node.

    Method for processing a database query

    公开(公告)号:US09953065B2

    公开(公告)日:2018-04-24

    申请号:US14621466

    申请日:2015-02-13

    IPC分类号: G06F17/30 G06F15/16

    摘要: The invention relates to a computer-implemented method for processing a query in a database, the query comprising a search value. The database comprises a plurality of datasets the datasets comprising entries, wherein distance statistics are assigned to the datasets. The distance statistics describe the minimum and maximum distance between the values of the entries of a dataset of the plurality of datasets and a reference value. The method comprises determining the distance between the search value and the reference value, said determination resulting in a search distance, determining a subset of datasets from the plurality of datasets for which the search distance is within the limits given by the minimum and maximum distances described by the respective distance statistics, and searching for the search value in the subset of datasets.

    EFFICIENT PROCESSING OF DATA EXTENTS
    17.
    发明申请

    公开(公告)号:US20180060386A1

    公开(公告)日:2018-03-01

    申请号:US15249509

    申请日:2016-08-29

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30448 G06F17/30395

    摘要: The present disclosure relates to a computer-implemented method, computer program product, and computer system, for optimization of query processing a set of data extents on which a table is stored. Attribute value information may be maintained for each data extent. The attribute value information indicate as ranges the minimum and maximum values of an attribute of the entries stored in the respective extent. A first metric of a first data extent of the set may determine splitting the first data extent into sub-extents increases query processing efficiency. A second metric of a second data extent and a third data extent may determine merging the second data extent and the third data extent increases query processing efficiency.

    Avoidance of intermediate data skew in a massive parallel processing environment
    20.
    发明授权
    Avoidance of intermediate data skew in a massive parallel processing environment 有权
    避免在大规模并行处理环境中的中间数据偏移

    公开(公告)号:US09569493B2

    公开(公告)日:2017-02-14

    申请号:US14144893

    申请日:2013-12-31

    IPC分类号: G06F17/30

    摘要: A computer-implemented method for minimizing join operation processing time within a database system based on estimated joined table spread of the database system has been provided. The computer-implemented method includes, estimating value distribution of data in a joined table, wherein the joined table is a result of join operation between two instances of tables of a database system. The computer-implemented method further includes determining boundaries for partitioning at least one range of attributes of the estimated value distribution, wherein the boundaries for partitioning at least one range of attributes of the estimated value distribution corresponds to a same number of rows of the joined table. The computer-implemented method further includes determining at least one assignment of the determined partition of the at least one range of attributes to processing units of the database system.

    摘要翻译: 已经提供了一种基于数据库系统的估计连接表扩展来最小化数据库系统内的连接操作处理时间的计算机实现的方法。 计算机实现的方法包括:估计联接表中的数据的值分布,其中所连接的表是数据库系统的两个表的实例之间的连接操作的结果。 计算机实现的方法还包括确定用于划分估计值分布的属性的至少一个范围的边界,其中用于划分估计值分布的至少一个属性范围的边界对应于所连接的表的相同数量的行 。 计算机实现的方法还包括确定至少一个属性范围的所确定的分区的至少一个分配到数据库系统的处理单元。