TRANSFORMING QUERIES USING BITVECTOR AWARE OPTIMIZATION

    公开(公告)号:US20210319023A1

    公开(公告)日:2021-10-14

    申请号:US16917489

    申请日:2020-06-30

    Abstract: the present disclosure relates to systems, methods, and computer-readable media for optimizing and implementing operator trees based on a received query. For example, systems disclosed herein may generate an operator tree based on a received query. The systems described herein may systematically analyze the impact of bitvector filters in optimizing a join order of the operator tree to generate an optimized operator tree. The systems described herein may further implement the bit-vector aware operator tree by providing the optimized operator tree to an execution engine for further processing.

    EFFICIENTLY CONSTRUCTING REGRESSION MODELS FOR SELECTIVITY ESTIMATION

    公开(公告)号:US20210406744A1

    公开(公告)日:2021-12-30

    申请号:US16917857

    申请日:2020-06-30

    Abstract: A model generator constructs a model for estimating selectivity of database operations by determining a number of training examples necessary for the model to achieve a target accuracy and by generating approximate selectivity labels for the training examples. The model generator may train the model on an initial number of training examples using cross-validation. The model generator may determine whether the model satisfies the target accuracy and iteratively and geometrically increase the number of training examples based on an optimized geometric step size (which may minimize model construction time) until the model achieves the target accuracy based on a defined confidence level. The model generator may generate labels using a subset of tuples from an intermediate query expression. The model generator may iteratively increase a size of the subset of tuples used until a relative error of the generated labels is below a target threshold.

    EXTENSIBLE DATA TRANSFORMATIONS
    4.
    发明公开

    公开(公告)号:US20240184798A1

    公开(公告)日:2024-06-06

    申请号:US18075365

    申请日:2022-12-05

    CPC classification number: G06F16/258

    Abstract: Methods, computer systems, computer-storage media, and graphical user interfaces are provided for facilitating data transformations, according to embodiments of the present invention. In one embodiment, a set of example values are received. A repository of transformation tools is searched to identify a new transformation tool as relevant to a data transformation associated with the received set of example values. The repository includes annotations associated with the new transformation tool. The new transformation tool is used to generate a transformation program that produces transformed output values. Additional annotations are generated for the new transformation tool based on the transformed output values.

    FACILITATING DATA TRANSFORMATIONS
    7.
    发明公开

    公开(公告)号:US20240028607A1

    公开(公告)日:2024-01-25

    申请号:US18374490

    申请日:2023-09-28

    CPC classification number: G06F16/258 G06F16/245 G06F16/211 G06N5/025

    Abstract: Methods, computer systems, computer-storage media, and graphical user interfaces are provided for facilitating data transformations, according to embodiments of the present invention. In one embodiment, a set of example values including example input values that indicate data values to be transformed and example output values that indicate a desired form in which to transform data. Based on the set of example values, a data transformation function that is relevant to the set of example values is identified. The data transformation function is used to generate a transformation program to transform the example input values to the desired form in which to transform data. A suggestion of the transformation program can be provided to a user device, wherein selection of the transformation program suggestion results in a data transformation.

    USING QUERY LOGS TO OPTIMIZE EXECUTION OF PARAMETRIC QUERIES

    公开(公告)号:US20220414099A1

    公开(公告)日:2022-12-29

    申请号:US17361016

    申请日:2021-06-28

    Abstract: The present disclosure relates to systems, methods, and computer-readable media for optimizing selection of a cached execution plan to use in processing a parametric query. For example, systems described herein involve training a plan selection model that makes use of machine learning to identify an execution plan from a set of pre-selected execution plans based on predicted cost of executing a query instance in accordance with the selected execution plan (e.g., relative to predicted costs of executing the query instance using other pre-selected execution plans). This application describes features related to lowering costs associated with selecting the execution plan in a way that will continue to be more accurate overtime based on training and refining the plan selection model.

Patent Agency Ranking