Collecting and annotating transformation tools for use in generating transformation programs

    公开(公告)号:US11809223B2

    公开(公告)日:2023-11-07

    申请号:US17520926

    申请日:2021-11-08

    CPC classification number: G06F16/258

    Abstract: Methods, computer systems, computer-storage media, and graphical user interfaces are provided for facilitating data transformations, according to embodiments of the present invention. In one embodiment, a plurality of remote sources is searched to identify candidate transformation tools relevant for performing data transformations. The candidate transformation tools are analyzed to identify tool examples corresponding with each of the candidate transformation tools. For each of the candidate transformation tools, the tool examples are stored in association with the corresponding candidate transformation tool. Based on a comparison of tool examples with example values, a transformation tool is identified as relevant to facilitate transforming example input values to the desired form in which to transform data.

    Machine-learned predictive models and systems for data preparation recommendations

    公开(公告)号:US11488068B2

    公开(公告)日:2022-11-01

    申请号:US16886155

    申请日:2020-05-28

    Inventor: Yeye He Cong Yan

    Abstract: Systems are provided for facilitating the building and use of models used to make data preparation recommendations. The systems identify ground truth from a plurality of notebooks and utilizes the ground truth to generate the corresponding data preparation recommendation models. The data preparation recommendation models are used to predict accurate (e.g., useful and relevant) data preparations steps based on user input and user notebook data. The data preparation computing system generates a recommendation prompt based on output from the data preparation recommendation model that can be viewed and/or selected by the user to be applied to the user's notebook data.

    Collecting and annotating transformation tools for use in generating transformation programs

    公开(公告)号:US11170020B2

    公开(公告)日:2021-11-09

    申请号:US15343720

    申请日:2016-11-04

    Abstract: Methods, computer systems, computer-storage media, and graphical user interfaces are provided for facilitating data transformations, according to embodiments of the present invention. In one embodiment, a plurality of remote sources is searched to identify candidate transformation tools relevant for performing data transformations. The candidate transformation tools are analyzed to identify tool examples corresponding with each of the candidate transformation tools. For each of the candidate transformation tools, the tool examples are stored in association with the corresponding candidate transformation tool. Based on a comparison of tool examples with example values, a transformation tool is identified as relevant to facilitate transforming example input values to the desired form in which to transform data.

    Generating and ranking transformation programs

    公开(公告)号:US11163788B2

    公开(公告)日:2021-11-02

    申请号:US15343704

    申请日:2016-11-04

    Abstract: Methods, computer systems, computer-storage media, and graphical user interfaces are provided for facilitating data transformations, according to embodiments of the present invention. In one embodiment, a set of example values is received. An index to identify a plurality of data transformation tools that are relevant to the set of example values is referenced, wherein each of the data transformation tools correspond with one or more tool examples. The data transformation tools are ranked based on an extent of similarity between the set of example values and the tool examples. For data transformation tools associated with the extent of similarity that exceeds a similarity threshold, a transformation program is generated that uses the data transformation tool and a supplemental transformation tool to transform the one or more example input values to the desired form in which to transform data.

    MACHINE-LEARNED PREDICTIVE MODELS AND SYSTEMS FOR DATA PREPARATION RECOMMENDATIONS

    公开(公告)号:US20210319357A1

    公开(公告)日:2021-10-14

    申请号:US16886155

    申请日:2020-05-28

    Inventor: Yeye He Cong Yan

    Abstract: Systems are provided for facilitating the building and use of models used to make data preparation recommendations. The systems identify ground truth from a plurality of notebooks and utilizes the ground truth to generate the corresponding data preparation recommendation models. The data preparation recommendation models are used to predict accurate (e.g., useful and relevant) data preparations steps based on user input and user notebook data. The data preparation computing system generates a recommendation prompt based on output from the data preparation recommendation model that can be viewed and/or selected by the user to be applied to the user's notebook data.

    Synthesizing mapping relationships using table corpus

    公开(公告)号:US10650050B2

    公开(公告)日:2020-05-12

    申请号:US15480926

    申请日:2017-04-06

    Inventor: Yeye He Yue Wang

    Abstract: Methods and systems for synthesizing mapping tables using table corpus is provided. A functional dependency between at least two items of an input table is determined. A plurality of two-column tables are extracted from the table corpus. The extracted plurality of two-column tables are synthesized to determine at least one mapping table having a first column having the functional dependency with a second column. A next item of the input table is provided from the determined at least one mapping table.

    Automated database schema annotation

    公开(公告)号:US10452661B2

    公开(公告)日:2019-10-22

    申请号:US14743510

    申请日:2015-06-18

    Abstract: Techniques and constructs that improve annotating target columns of a target database by performing automated annotation of the target columns using sources. The techniques include calculating a similarity score between a target column and columns extracted from a table that is included in a source. The similarity score is calculated based at least in part on a similarity between a value in the target column of the target database and a column value of the extracted column from the table and on a similarity between an identity of the target column of the target database and column identities of the extracted columns from the table. In some examples, the techniques calculate similarity scores for one or more extracted columns and annotate the target column based on the similarity scores.

    Joining semantically-related data using big table corpora

    公开(公告)号:US10198471B2

    公开(公告)日:2019-02-05

    申请号:US14726547

    申请日:2015-05-31

    Abstract: Examples of the disclosure enable performing semantic joins using a big table corpus. Pairs of values from at least two data sets are identified. The pairs of values include one value from a first one of the data sets and one value from a second one of the data sets. Statistical co-occurrence scores for the identified pairs of values are determined based on historical co-occurrence data. The determined statistical co-occurrence scores are used for predicting a semantic relationship between the at least two data sets. The predicted semantic relationship is used for joining the at least two data sets.

    DETERMINING A HIERARCHICAL CONCEPT TREE USING A LARGE CORPUS OF TABLE VALUES

    公开(公告)号:US20180357262A1

    公开(公告)日:2018-12-13

    申请号:US15621767

    申请日:2017-06-13

    Abstract: This disclosure provides for a system, method, and computer-readable medium for implementing a table corpus processing server that identifies concepts within enterprise domain data. The table corpus processing server is configured to iteratively group values in a table corpus based on co-occurrence statistics to produce a candidate hierarchical tree. The candidate hierarchical tree is then summarized by selecting nodes that can best “describe” the original corpus, which leads to a small tree that often corresponds to desired concept hierarchies. The table corpus processing server employs a parallel dynamic programming approach that allows the disclosed embodiments to scale with amount of enterprise domain data being analyzed.

Patent Agency Ranking