摘要:
According to some embodiments, a system and method for a parallel join of relational data tables may be provided by calculating, by a plurality of concurrently executing execution threads, hash values for join columns of a first input table and a second input table; storing the calculated hash values in a set of disjoint thread-local hash maps for each of the first input table and the second input table; merging the set of thread-local hash maps of the first input table, by a second plurality of execution threads operating concurrently, to produce a set of merged hash maps; comparing each entry of the merged hash maps to each entry of the set of thread-local hash maps for the second input table to determine whether there is a match, according to a join type; and generating an output table including matches as determined by the comparing.
摘要:
According to some embodiments, a system and method for a parallel join of relational data tables may be provided by calculating, by a plurality of concurrently executing execution threads, hash values for join columns of a first input table and a second input table; storing the calculated hash values in a set of disjoint thread-local hash maps for each of the first input table and the second input table; merging the set of thread-local hash maps of the first input table, by a second plurality of execution threads operating concurrently, to produce a set of merged hash maps; comparing each entry of the merged hash maps to each entry of the set of thread-local hash maps for the second input table to determine whether there is a match, according to a join type; and generating an output table including matches as determined by the comparing.
摘要:
According to some embodiments, a system and method for a parallel join of relational data tables may be provided by calculating, by a plurality of concurrently executing execution threads, hash values for join columns of a first input table and a second input table; storing the calculated hash values in a set of disjoint thread-local hash maps for each of the first input table and the second input table; merging the set of thread-local hash maps of the first input table, by a second plurality of execution threads operating concurrently, to produce a set of merged hash maps; comparing each entry of the merged hash maps to each entry of the set of thread-local hash maps for the second input table to determine whether there is a match, according to a join type; and generating an output table including matches as determined by the comparing.
摘要:
According to some embodiments, a data structure may be provided by separating an input table into a plurality of partitions; generating, by each of a first plurality of execution threads operating concurrently, a local hash table for each of the threads, each local hash table storing key—index pairs; and merging the local hash tables, by a second plurality of execution threads operating concurrently, to produce a set of disjoint result hash tables. An overall result may be obtained from the result set of disjoint result hash tables. The data structure may used in a parallel computing environment to determine an aggregation.
摘要:
A system, method and medium may provide determination of a first plurality of a plurality of data records assigned to a first processing unit, identification of a first record of the first plurality of data records, the first record associated with a first key value, generation of a first dictionary entry of a first dictionary for the first key value, storage of a first identifier of the first record as a tail identifier and as a head identifier in the first dictionary entry, storage an end flag in a first shared memory location, the first shared memory location associated with the first record, identification of a second record of the first plurality of data records, the second record associated with the first key value, replacement of the tail identifier in the first dictionary entry with a second identifier of the second record, and storage of the first identifier in a second shared memory location, the second shared memory location associated with the second record.
摘要:
A system, method and medium may provide determination of a first plurality of a plurality of data records assigned to a first processing unit, identification of a first record of the first plurality of data records, the first record associated with a first key value, generation of a first dictionary entry of a first dictionary for the first key value, storage of a first identifier of the first record as a tail identifier and as a head identifier in the first dictionary entry, storage an end flag in a first shared memory location, the first shared memory location associated with the first record, identification of a second record of the first plurality of data records, the second record associated with the first key value, replacement of the tail identifier in the first dictionary entry with a second identifier of the second record, and storage of the first identifier in a second shared memory location, the second shared memory location associated with the second record.
摘要:
A new dictionary can be created for a result column in a query plan operation executed on a database. The result column can be generated by multiple worker jobs running in parallel to read tasks from a shared queue as part of a query plan operation that includes a group-by column within an input set of input columns. The group-by column can include an original dictionary for all values contained within the group-by column If the new dictionary has fewer entries than the original dictionary for the group-by column such that mapping is required between old value identifiers within the group-by column and new value identifiers within the result column, the old value identifiers are renamed to the new value identifiers using a mapping vector.
摘要:
A new dictionary can be created for a result column in a query plan operation executed on a database. The result column can be generated by multiple worker jobs running in parallel to read tasks from a shared queue as part of a query plan operation that includes a group-by column within an input set of input columns. The group-by column can include an original dictionary for all values contained within the group-by column If the new dictionary has fewer entries than the original dictionary for the group-by column such that mapping is required between old value identifiers within the group-by column and new value identifiers within the result column, the old value identifiers are renamed to the new value identifiers using a mapping vector.