Adaptive Approach to Lazy Materialization in Database Scans using Pushed Filters

    公开(公告)号:US20240256543A1

    公开(公告)日:2024-08-01

    申请号:US18160861

    申请日:2023-01-27

    CPC classification number: G06F16/24545 G06F11/3409 G06F16/221

    Abstract: Disclosed herein is a method for determining whether to apply a lazy materialization technique to a query run. A data processing service receives a request to perform a query identifying a filter column and a non-filter column in a columnar database. The data processing service accesses a first task of contiguous rows in the filter column from a cloud-based object storage. The data processing service applies a filter defined by the query to the first task. The data processing service generates filter results for the first task that may include a percentage of the first task discarded and a run-time. The data processing service determines, based on the filter results for the first task, a likelihood value that indicates a likelihood of gaining a performance benefit by applying the lazy materialization technique to a second task of the query.

    STATIC APPROACH TO LAZY MATERIALIZATION IN DATABASE SCANS USING PUSHED FILTERS

    公开(公告)号:US20240256539A1

    公开(公告)日:2024-08-01

    申请号:US18160850

    申请日:2023-01-27

    CPC classification number: G06F16/24539 G06F16/221

    Abstract: Disclosed herein is a method for determining whether to apply a lazy materialization technique to a query run. The method includes receiving a request to perform a new query in a columnar database containing a plurality of columns. A step in the method includes accessing a set of data in a column of the plurality of columns based on the query. The method includes generating an input to a machine-learned model comprising characteristics of the set of data in the column. From the machine-learned model, the method includes generating a likelihood value indicative of whether a filter of a first portion of the set of data in the column has greater efficiency than a download followed by a filter of the set of data in the column. The method further includes comparing the likelihood value to a threshold value. Based on the comparison, the method includes filtering the first portion of the set of data before downloading the set of data if the likelihood value is equal to or above the threshold value.

    Dictionary filtering and evaluation in columnar databases

    公开(公告)号:US12242485B2

    公开(公告)日:2025-03-04

    申请号:US18162616

    申请日:2023-01-31

    Abstract: Disclosed herein is a method, system, or non-transitory computer readable medium for evaluating a query on a columnar dataset comprising one or more dictionaries associated with columns in the dataset. The method includes receiving a request to perform a query comprising at least a operator and a request to return information about a value of interest in a columnar dataset stored on cloud storage. At least one column in the columnar dataset is based on a dictionary. The dictionary maps one or more values for a column to one or more respective identifiers. The method determines whether to perform dictionary filtering for the query by calculating a metric based on one or more factors. Responsive to the metric being below a threshold, which may be predetermined, the method performs the dictionary filtering.

    Evaluating expressions over dictionary data

    公开(公告)号:US12210528B2

    公开(公告)日:2025-01-28

    申请号:US18162607

    申请日:2023-01-31

    Abstract: Disclosed herein is a method, system, or non-transitory computer readable medium for evaluating a query on a columnar dataset comprising one or more dictionaries associated with columns in the dataset. The method includes receiving a request to perform a query comprising at least an operator for a columnar dataset on cloud storage. At least one column in the dataset is based on a dictionary, and the dictionary maps one or more values for a column to one or more respective identifiers. The method evaluates the operator on one or more values of the dictionary to generate an updated dictionary comprising updated values. The method may decode the updated dictionary into an updated column comprising updated data values.

    Scan parsing
    5.
    发明授权

    公开(公告)号:US12189628B2

    公开(公告)日:2025-01-07

    申请号:US18162366

    申请日:2023-01-31

    Abstract: The present application discloses a method, system, and computer system for parsing files. The method includes receiving an indication that a first file is to be processed, determining to begin processing the first file using a first processing engine based at least in part on one or more predefined heuristics, indicating to process the first file using a first processing engine, determining whether a particular error in processing the first file using the first processing engine has been detected, in response to determining that the particular error has been detected, indicate to stop processing the first file using the first processing engine and indicate to continue processing using a second processing engine, and storing in memory information obtained based on processing the first file by one or more of the first processing engine and the second processing engine.

    Adaptive approach to lazy materialization in database scans using pushed filters

    公开(公告)号:US12124450B2

    公开(公告)日:2024-10-22

    申请号:US18160861

    申请日:2023-01-27

    CPC classification number: G06F16/24545 G06F11/3409 G06F16/221

    Abstract: Disclosed herein is a method for determining whether to apply a lazy materialization technique to a query run. A data processing service receives a request to perform a query identifying a filter column and a non-filter column in a columnar database. The data processing service accesses a first task of contiguous rows in the filter column from a cloud-based object storage. The data processing service applies a filter defined by the query to the first task. The data processing service generates filter results for the first task that may include a percentage of the first task discarded and a run-time. The data processing service determines, based on the filter results for the first task, a likelihood value that indicates a likelihood of gaining a performance benefit by applying the lazy materialization technique to a second task of the query.

    Scan parsing
    7.
    发明授权

    公开(公告)号:US12072880B2

    公开(公告)日:2024-08-27

    申请号:US17892376

    申请日:2022-08-22

    CPC classification number: G06F16/24542 G06F16/285

    Abstract: The present application discloses a method, system, and computer system for parsing files. The method includes receiving an indication that a first file is to be processed, determining to begin processing the first file using a first processing engine based at least in part on one or more predefined heuristics, indicating to process the first file using a first processing engine, determining whether a particular error in processing the first file using the first processing engine has been detected, in response to determining that the particular error has been detected, indicate to stop processing the first file using the first processing engine and indicate to continue processing using a second processing engine, and storing in memory information obtained based on processing the first file by one or more of the first processing engine and the second processing engine.

    Dictionary Filtering and Evaluation in Columnar Databases

    公开(公告)号:US20240256550A1

    公开(公告)日:2024-08-01

    申请号:US18162616

    申请日:2023-01-31

    CPC classification number: G06F16/24558 G06F11/3409 G06F16/221

    Abstract: Disclosed herein is a method, system, or non-transitory computer readable medium for evaluating a query on a columnar dataset comprising one or more dictionaries associated with columns in the dataset. The method includes receiving a request to perform a query comprising at least a operator and a request to return information about a value of interest in a columnar dataset stored on cloud storage. At least one column in the columnar dataset is based on a dictionary. The dictionary maps one or more values for a column to one or more respective identifiers. The method determines whether to perform dictionary filtering for the query by calculating a metric based on one or more factors. Responsive to the metric being below a threshold, which may be predetermined, the method performs the dictionary filtering.

    Evaluating Expressions Over Dictionary Data
    9.
    发明公开

    公开(公告)号:US20240256549A1

    公开(公告)日:2024-08-01

    申请号:US18162607

    申请日:2023-01-31

    CPC classification number: G06F16/24558 G06F11/3409 G06F16/221

    Abstract: Disclosed herein is a method, system, or non-transitory computer readable medium for evaluating a query on a columnar dataset comprising one or more dictionaries associated with columns in the dataset. The method includes receiving a request to perform a query comprising at least an operator for a columnar dataset on cloud storage. At least one column in the dataset is based on a dictionary, and the dictionary maps one or more values for a column to one or more respective identifiers. The method evaluates the operator on one or more values of the dictionary to generate an updated dictionary comprising updated values. The method may decode the updated dictionary into an updated column comprising updated data values.

    Scan Parsing
    10.
    发明公开
    Scan Parsing 审中-公开

    公开(公告)号:US20240061840A1

    公开(公告)日:2024-02-22

    申请号:US18162366

    申请日:2023-01-31

    CPC classification number: G06F16/24542 G06F16/285

    Abstract: The present application discloses a method, system, and computer system for parsing files. The method includes receiving an indication that a first file is to be processed, determining to begin processing the first file using a first processing engine based at least in part on one or more predefined heuristics, indicating to process the first file using a first processing engine, determining whether a particular error in processing the first file using the first processing engine has been detected, in response to determining that the particular error has been detected, indicate to stop processing the first file using the first processing engine and indicate to continue processing using a second processing engine, and storing in memory information obtained based on processing the first file by one or more of the first processing engine and the second processing engine.

Patent Agency Ranking