-
公开(公告)号:US10936594B2
公开(公告)日:2021-03-02
申请号:US15860115
申请日:2018-01-02
Applicant: International Business Machines Corporation
Inventor: Felix O. Beier , Andreas Brodt , Namik Hrle , Oliver Schiller
IPC: G06F16/00 , G06F16/2455 , G06F16/23
Abstract: A method, a computer program product and a computer system are provided. Attribute value information contains at least a minimum value representing a smallest value of a first attribute and a maximum value representing a largest value of the first attribute, thereby defining a first range of values of the first attribute. A received query against a data table requests one or more values of at least the first attribute that are covered by the first range of values. The attribute value information may be used for selecting a data block of the data table as a candidate potentially including at least part of the requested one or more values and scanning the data block. In response to determining that the data block does not include the one or more requested values, the attribute value information may be updated accordingly.
-
公开(公告)号:US10915533B2
公开(公告)日:2021-02-09
申请号:US16276790
申请日:2019-02-15
Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
Inventor: Felix O. Beier , Thomas F. Boehme , Andreas Brodt , Oliver Schiller
IPC: G06F16/20 , G06F16/2455 , G06F16/242
Abstract: The method may include providing a plurality of synopsis techniques for determining a plurality of attribute value information indicative of the at least one attribute. The method may include determining a data characteristic describing the plurality of data rows of the current data block. The method may include selecting, based on the determined data characteristic, at least one synopsis technique of the provided plurality of synopsis techniques suitable for generating the plurality of attribute value information for the at least one attribute of the current data block. The method may include determining the plurality of attribute value information for the at least one attribute of the plurality of data rows of the current data block using the at least one selected synopsis technique. The method may include storing the determined plurality of attribute value information for the current data block to be used for query processing against the data table.
-
公开(公告)号:US10884704B2
公开(公告)日:2021-01-05
申请号:US15710902
申请日:2017-09-21
Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
Inventor: Felix Beier , Andreas Brodt , Oliver Schiller , Knut Stolze
IPC: G06F7/24
Abstract: A computer-implemented method, a system, and a computer program product for sorting a data table by an attribute of the data table is provided. Each data block of the data table is provided with attribute value information being indicative of distinct values and/or ranges of values of the attribute in each of the data blocks of the data table. Distinct ranges and/or distinct values of the attribute of the data table are derived from the attribute value information. For each determined distinct range and/or distinct value, a bucket may be created. For each created bucket, it may be determined, using the attribute value information, which data block of the data table is to be scanned. Each scanned record is distributed to a corresponding bucket. The entries or records in each bucket having more than one record, may be sorted by the attribute.
-
4.
公开(公告)号:US10783115B2
公开(公告)日:2020-09-22
申请号:US16005839
申请日:2018-06-12
Applicant: International Business Machines Corporation
Inventor: Thomas F. Boehme , Andreas Brodt , Namik Hrle , Oliver Schiller
IPC: G06F11/14 , G06F11/00 , G06F3/06 , G06F16/16 , G06F7/36 , G06F16/22 , G06F16/27 , G06F16/245 , G06F16/28
Abstract: Sorting and storing a dataset, the dataset comprising at least one attribute. The method includes defining a set of data blocks and assigning to each data block a predefined maximum number of entries or a predefined maximum amount of storage, dividing the dataset into a sequence of multiple sub-datasets each having one value or a range of values of the attribute, wherein each pair of successive sub-datasets of the sequence are non-overlapping or overlapping at their respective extremum value of the attribute, for each sub-dataset of the multiple sub-datasets: in case the sub-dataset fully or partially fits into a data block of the defined data blocks storing the sub-dataset into at least the data block, the sub-dataset that partially fits into the data block comprising a number of entries that is smaller than a predefined maximum threshold.
-
公开(公告)号:US10678784B2
公开(公告)日:2020-06-09
申请号:US15861746
申请日:2018-01-04
Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
Inventor: Thomas F. Boehme , Andreas Brodt , Oliver Koeth , Oliver Schiller
IPC: G06F17/00 , G06F16/242 , G06F16/22
Abstract: A method, computer program product, and system for processing attribute value information for a data set. The method, computer program product, and system includes receiving a first data query on the data set. The first data query includes a condition on at least one attribute of the data set. While processing the first data query, data blocks containing records of the data set may be scanned. The data blocks contain first data blocks that are full. Attribute value information may be generated for the at least one attribute for the first data blocks. The attribute value information may be stored and a second data query involving a condition on at least one of the at least one attribute may be processed using the stored attribute value information.
-
公开(公告)号:US10528680B2
公开(公告)日:2020-01-07
申请号:US15452778
申请日:2017-03-08
Applicant: International Business Machines Corporation
Inventor: Thomas Boehme , Andreas Brodt , Oliver Koeth , Oliver Schiller
IPC: G06F17/30
Abstract: A first data table and a second table to be joined is determined. The first data table and the second data table have a join attribute. Data blocks of the first data table are stored on a storage device. An attribute value information for the join attribute for the data block of the first data table is determined. At least one partition for the first data table and the second table is defined using at least he attribute value information on the join attribute. Each partition of the at least one partition has a respective partition range of values of the join attribute. A pair wise partition join is processed on a first partition range of the determined partition ranges.
-
公开(公告)号:US10331670B2
公开(公告)日:2019-06-25
申请号:US15711216
申请日:2017-09-21
Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
Inventor: Thomas F. Boehme , Andreas Brodt , James L. Finnie , Oliver Schiller
IPC: G06F16/22 , G06F16/28 , G06F16/2455
Abstract: The method may include providing, in accordance with a column-oriented storage technique, the data table as columns corresponding to the plurality of attributes, whereby each column includes a plurality of separate data blocks. The method may also include determining the plurality of records of the provided data table for which a plurality of attribute values of at least one selected column is contained in a plurality of predetermined data blocks. The method may further include determining, for each column of at least a part of the plurality of columns within the determined plurality of records, a plurality of attribute value information descriptive of an associated attribute within the column and providing an indication of the one or more data blocks for which the plurality of attribute value information is determined. The method may also include storing the determined plurality of attribute value information for enabling query processing.
-
公开(公告)号:US10262033B2
公开(公告)日:2019-04-16
申请号:US15073890
申请日:2016-03-18
Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
Inventor: Andreas Brodt , Oliver Schiller , Marc Schwind , Mathias Trumpp
IPC: G06F17/30
Abstract: The present disclosure provides a computer implemented method and system for processing queries. The first data table comprises a set of data blocks. Each of the set of data blocks may be assigned respective attribute value information. A query involving a query condition on at least a first attribute of the first data table may be received. And a subset of the set of data blocks to be accessed may be selected based on the query condition and using the attribute value information. Furthermore, a guaranteed bound may be determined for a statistical metric on the first attribute based on at least one of the number of data blocks of the subset of data blocks and the attribute value information of the subset of data blocks. The guaranteed bound for the statistical metric may be used when determining a query execution plan for the received query.
-
公开(公告)号:US10248695B2
公开(公告)日:2019-04-02
申请号:US15161396
申请日:2016-05-23
Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
Inventor: Felix O. Beier , Thomas F. Boehme , Andreas Brodt , Oliver Schiller
IPC: G06F17/30
Abstract: The method may include providing a plurality of synopsis techniques for determining a plurality of attribute value information indicative of the at least one attribute. The method may include determining a data characteristic describing the plurality of data rows of the current data block. The method may include selecting, based on the determined data characteristic, at least one synopsis technique of the provided plurality of synopsis techniques suitable for generating the plurality of attribute value information for the at least one attribute of the current data block. The method may include determining the plurality of attribute value information for the at least one attribute of the plurality of data rows of the current data block using the at least one selected synopsis technique. The method may include storing the determined plurality of attribute value information for the current data block to be used for query processing against the data table.
-
公开(公告)号:US20190065495A1
公开(公告)日:2019-02-28
申请号:US15691960
申请日:2017-08-31
Applicant: International Business Machines Corporation
Inventor: Daniel Martin , Andreas Brodt , Oliver Schiller , Felix Beier , Knut Stolze
CPC classification number: G06F16/2386 , G06F9/466 , G06F16/23 , G06F16/2365 , G06F16/2379
Abstract: The present disclosure relates to a method for enforcing constraints on data in a data processing system. The method comprises providing a set of constraints on the data. A first data update request may be received at the transactional engine and executes on the first dataset. A second data update request associated with the received data update request is determined and sent by the transactional engine to the analytical engine. The analytical engine executes the second data, resulting in a set of changes in the second dataset. The transactional engine commits the update of the first dataset before or after receiving the results of the checking of the set of constraints. The update on the first dataset is aborted by the transactional engine in response to receiving the results of the checking of the set of constraints, wherein the results indicate that the set of constraints are not met.
-
-
-
-
-
-
-
-
-