-
31.
公开(公告)号:US11809223B2
公开(公告)日:2023-11-07
申请号:US17520926
申请日:2021-11-08
Applicant: MICROSOFT TECHNOLOGY LICENSING, LLC
Inventor: Yeye He , Kris Ganjam , Vivek Ravindranath Narasayya , Surajit Chaudhuri , Xu Chu
CPC classification number: G06F16/258
Abstract: Methods, computer systems, computer-storage media, and graphical user interfaces are provided for facilitating data transformations, according to embodiments of the present invention. In one embodiment, a plurality of remote sources is searched to identify candidate transformation tools relevant for performing data transformations. The candidate transformation tools are analyzed to identify tool examples corresponding with each of the candidate transformation tools. For each of the candidate transformation tools, the tool examples are stored in association with the corresponding candidate transformation tool. Based on a comparison of tool examples with example values, a transformation tool is identified as relevant to facilitate transforming example input values to the desired form in which to transform data.
-
公开(公告)号:US11714790B2
公开(公告)日:2023-08-01
申请号:US17490908
申请日:2021-09-30
Applicant: Microsoft Technology Licensing, LLC
Inventor: Meiyalagan Balasubramanian , Lengning Liu , Aditya Kuppa , Kirk Hartmann Freiheit , Kalen Wong , Paula Budig Greve , Patrick Clinton Little , Lucas Pritz , Yue Wang , Vivek Ravindranath Narasayya , Katchaguy Areekijseree , Yeye He , Surajit Chaudhuri , Gaurav Ghosh
IPC: G06F16/21 , G06F16/215 , G06F16/2455
CPC classification number: G06F16/215 , G06F16/24556
Abstract: Solutions for data unification include: receiving a data record, the data record comprising a plurality of data fields; selecting, from among the plurality of data fields, a subset of the data fields, the subset of the data fields being fewer in number than the plurality of data fields, wherein selecting the subset of the data fields comprises: applying a first rule to select at least a first one of the data fields within the data record for inclusion in the subset of the data fields; using content of the subset of the data fields, generating a stable identifier (stableID) for the data record; and inserting the stableID into a primary key data field of the data record.
-
公开(公告)号:US11488068B2
公开(公告)日:2022-11-01
申请号:US16886155
申请日:2020-05-28
Applicant: MICROSOFT TECHNOLOGY LICENSING, LLC
Abstract: Systems are provided for facilitating the building and use of models used to make data preparation recommendations. The systems identify ground truth from a plurality of notebooks and utilizes the ground truth to generate the corresponding data preparation recommendation models. The data preparation recommendation models are used to predict accurate (e.g., useful and relevant) data preparations steps based on user input and user notebook data. The data preparation computing system generates a recommendation prompt based on output from the data preparation recommendation model that can be viewed and/or selected by the user to be applied to the user's notebook data.
-
34.
公开(公告)号:US11170020B2
公开(公告)日:2021-11-09
申请号:US15343720
申请日:2016-11-04
Applicant: MICROSOFT TECHNOLOGY LICENSING, LLC
Inventor: Yeye He , Kris Ganjam , Vivek Ravindranath Narasayya , Surajit Chaudhuri , Xu Chu
Abstract: Methods, computer systems, computer-storage media, and graphical user interfaces are provided for facilitating data transformations, according to embodiments of the present invention. In one embodiment, a plurality of remote sources is searched to identify candidate transformation tools relevant for performing data transformations. The candidate transformation tools are analyzed to identify tool examples corresponding with each of the candidate transformation tools. For each of the candidate transformation tools, the tool examples are stored in association with the corresponding candidate transformation tool. Based on a comparison of tool examples with example values, a transformation tool is identified as relevant to facilitate transforming example input values to the desired form in which to transform data.
-
公开(公告)号:US11163788B2
公开(公告)日:2021-11-02
申请号:US15343704
申请日:2016-11-04
Applicant: MICROSOFT TECHNOLOGY LICENSING, LLC
Inventor: Yeye He , Kris Ganjam , Vivek Ravindranath Narasayya , Surajit Chaudhuri , Xu Chu
Abstract: Methods, computer systems, computer-storage media, and graphical user interfaces are provided for facilitating data transformations, according to embodiments of the present invention. In one embodiment, a set of example values is received. An index to identify a plurality of data transformation tools that are relevant to the set of example values is referenced, wherein each of the data transformation tools correspond with one or more tool examples. The data transformation tools are ranked based on an extent of similarity between the set of example values and the tool examples. For data transformation tools associated with the extent of similarity that exceeds a similarity threshold, a transformation program is generated that uses the data transformation tool and a supplemental transformation tool to transform the one or more example input values to the desired form in which to transform data.
-
公开(公告)号:US20210319357A1
公开(公告)日:2021-10-14
申请号:US16886155
申请日:2020-05-28
Applicant: MICROSOFT TECHNOLOGY LICENSING, LLC
Abstract: Systems are provided for facilitating the building and use of models used to make data preparation recommendations. The systems identify ground truth from a plurality of notebooks and utilizes the ground truth to generate the corresponding data preparation recommendation models. The data preparation recommendation models are used to predict accurate (e.g., useful and relevant) data preparations steps based on user input and user notebook data. The data preparation computing system generates a recommendation prompt based on output from the data preparation recommendation model that can be viewed and/or selected by the user to be applied to the user's notebook data.
-
公开(公告)号:US10650050B2
公开(公告)日:2020-05-12
申请号:US15480926
申请日:2017-04-06
Applicant: Microsoft Technology Licensing, LLC
IPC: G06F16/21 , G06F16/901 , G06F16/28 , G06F16/22
Abstract: Methods and systems for synthesizing mapping tables using table corpus is provided. A functional dependency between at least two items of an input table is determined. A plurality of two-column tables are extracted from the table corpus. The extracted plurality of two-column tables are synthesized to determine at least one mapping table having a first column having the functional dependency with a second column. A next item of the input table is provided from the determined at least one mapping table.
-
公开(公告)号:US10452661B2
公开(公告)日:2019-10-22
申请号:US14743510
申请日:2015-06-18
Applicant: Microsoft Technology Licensing, LLC
Inventor: Philip A. Bernstein , Yeye He , Eli Cortez Custodio Vilarinho , Lev Novik
IPC: G06F16/00 , G06F16/2457 , G06F17/24 , G06F16/20
Abstract: Techniques and constructs that improve annotating target columns of a target database by performing automated annotation of the target columns using sources. The techniques include calculating a similarity score between a target column and columns extracted from a table that is included in a source. The similarity score is calculated based at least in part on a similarity between a value in the target column of the target database and a column value of the extracted column from the table and on a similarity between an identity of the target column of the target database and column identities of the extracted columns from the table. In some examples, the techniques calculate similarity scores for one or more extracted columns and annotate the target column based on the similarity scores.
-
公开(公告)号:US10198471B2
公开(公告)日:2019-02-05
申请号:US14726547
申请日:2015-05-31
Applicant: MICROSOFT TECHNOLOGY LICENSING, LLC
Inventor: Yeye He , Kris Kuppuswamy Ganjam , Xu Chu
IPC: G06F17/30
Abstract: Examples of the disclosure enable performing semantic joins using a big table corpus. Pairs of values from at least two data sets are identified. The pairs of values include one value from a first one of the data sets and one value from a second one of the data sets. Statistical co-occurrence scores for the identified pairs of values are determined based on historical co-occurrence data. The determined statistical co-occurrence scores are used for predicting a semantic relationship between the at least two data sets. The predicted semantic relationship is used for joining the at least two data sets.
-
公开(公告)号:US20180357262A1
公开(公告)日:2018-12-13
申请号:US15621767
申请日:2017-06-13
Applicant: Microsoft Technology Licensing, LLC
Inventor: Yeye He , Kris K. Ganjam , Li Keqian
IPC: G06F17/30
CPC classification number: G06F16/2246 , G06F16/2282 , G06F16/282 , G06F16/285 , G06F2216/03
Abstract: This disclosure provides for a system, method, and computer-readable medium for implementing a table corpus processing server that identifies concepts within enterprise domain data. The table corpus processing server is configured to iteratively group values in a table corpus based on co-occurrence statistics to produce a candidate hierarchical tree. The candidate hierarchical tree is then summarized by selecting nodes that can best “describe” the original corpus, which leads to a small tree that often corresponds to desired concept hierarchies. The table corpus processing server employs a parallel dynamic programming approach that allows the disclosed embodiments to scale with amount of enterprise domain data being analyzed.
-
-
-
-
-
-
-
-
-