摘要:
There is disclosed a system and method for executing multiple distinct aggregate queries. In an embodiment, the method comprises: providing at least one Counting Bloom Filter for each distinct column of an input data stream; reviewing count values in the at least one Counting Bloom Filter for the existence of duplicates in each distinct column; and if necessary, using a distinct hash operator to remove duplicates from each distinct column of the input data stream, thereby removing the need for replicating the input data stream and minimizing distinct hash operator processing. Also, the use of Counting Bloom Filters for monitoring data streams allow an early duplicate removal of the input stream of data, resulting in savings in computation time and memory resources.
摘要:
Disclosed is a data processing system, and an article of manufacturing for use with the data processing system. The data processing system joins rows associated with a column of a source table with rows associated with a column of a target table. The data processing system includes a source node containing the source table and including a target node containing the target table. The data processing system further includes a generating module for generating a reduced representation of selected rows associated with the column of the source table, and generating a representation of the column of the target table, a filtering module for filtering the generated reduced representation of selected rows associated with the column of the source table through the generated representation of the column of the target table, the filtered generated reduced representation of selected rows identifying source table rows that do not have to be joined with the target table, and a joining module for joining, to the rows associated with the column of the target table, the rows associated with the column of the source table minus the filtered generated reduced representation of selected rows.
摘要:
There is disclosed a system and method for executing multiple distinct aggregate queries. In an embodiment, the method comprises: providing at least one Counting Bloom Filter for each distinct column of an input data stream; reviewing count values in the at least one Counting Bloom Filter for the existence of duplicates in each distinct column; and if necessary, using a distinct hash operator to remove duplicates from each distinct column of the input data stream, thereby removing the need for replicating the input data stream and minimizing distinct hash operator processing. Also, the use of Counting Bloom Filters for monitoring data streams allow an early duplicate removal of the input stream of data, resulting in savings in computation time and memory resources.
摘要:
Disclosed is a data processing system, and an article of manufacturing for use with the data processing system. The data processing system joins rows associated with a column of a source table with rows associated with a column of a target table. The data processing system includes a source node containing the source table and including a target node containing the target table. The data processing system further includes a generating module for generating a reduced representation of selected rows associated with the column of the source table, and generating a representation of the column of the target table, a filtering module for filtering the generated reduced representation of selected rows associated with the column of the source table through the generated representation of the column of the target table, the filtered generated reduced representation of selected rows identifying source table rows that do not have to be joined with the target table, and a joining module for joining, to the rows associated with the column of the target table, the rows associated with the column of the source table minus the filtered generated reduced representation of selected rows.
摘要:
Disclosed is a data processing system implemented method, a data processing system, and an article of manufacturing for use with the data processing system. The data processing system implemented method is used for directing the data processing system to join rows associated with a column of a source table with rows associated with a column of a target table. The data processing system includes a source node containing the source table and including a target node containing the target table. The method includes generating a reduced representation of selected rows associated with the column of the source table, and generating a representation of the column of the target table, filtering the generated reduced representation of selected rows associated with the column of the source table through the generated representation of the column of the target table, the filtered generated reduced representation of selected rows identifying source table rows that do not have to be joined with the target table, and joining, to the rows associated with the column of the target table, the rows associated with the column of the source table minus the filtered generated reduced representation of selected rows.
摘要:
Disclosed is a data processing system implemented method, a data processing system, and an article of manufacturing for use with the data processing system. The data processing system implemented method is used for directing the data processing system to join rows associated with a column of a source table with rows associated with a column of a target table. The data processing system includes a source node containing the source table and including a target node containing the target table. The method includes generating a reduced representation of selected rows associated with the column of the source table, and generating a representation of the column of the target table, filtering the generated reduced representation of selected rows associated with the column of the source table through the generated representation of the column of the target table, the filtered generated reduced representation of selected rows identifying source table rows that do not have to be joined with the target table, and joining, to the rows associated with the column of the target table, the rows associated with the column of the source table minus the filtered generated reduced representation of selected rows.