-
公开(公告)号:US20230043015A1
公开(公告)日:2023-02-09
申请号:US17969377
申请日:2022-10-19
Applicant: MICROSOFT TECHNOLOGY LICENSING, LLC
Abstract: Systems are provided for facilitating the building and use of models used to make data preparation recommendations. The systems identify ground truth from a plurality of notebooks and utilizes the ground truth to generate the corresponding data preparation recommendation models. The data preparation recommendation models are used to predict accurate (e.g., useful and relevant) data preparations steps based on user input and user notebook data. The data preparation computing system generates a recommendation prompt based on output from the data preparation recommendation model that can be viewed and/or selected by the user to be applied to the user's notebook data.
-
公开(公告)号:US10795667B2
公开(公告)日:2020-10-06
申请号:US15850283
申请日:2017-12-21
Applicant: MICROSOFT TECHNOLOGY LICENSING, LLC
IPC: G06F8/70 , G06F16/23 , G06F16/2457 , G06F8/30
Abstract: Methods, computer systems, computer-storage media, and graphical user interfaces are provided for facilitating data type detection, according to embodiments of the present invention. In one embodiment, existing code is searched to identify a set of functions related to a target data type. Such functions can be executed using positive example values and negative example values. For each executed function, a logical explanation is generated that represents a distinction in execution of the positive example values from the negative example values. The executed functions can then be ranked based on the extent to which the corresponding logical explanations distinguish execution of the positive example values from the negative example values. A function suggestion corresponding with at least a highest ranked function can then be provided, for example to a user, to indicate a function for use in detecting the target data type.
-
公开(公告)号:US10789229B2
公开(公告)日:2020-09-29
申请号:US15621767
申请日:2017-06-13
Applicant: Microsoft Technology Licensing, LLC
Inventor: Yeye He , Kris K. Ganjam , Keqian Li
Abstract: A table corpus processing server identifies concepts within enterprise domain data. The table corpus processing server is configured to iteratively group values in a table corpus based on co-occurrence statistics to produce a candidate hierarchical tree. The candidate hierarchical tree is then summarized by selecting nodes that can best “describe” the original corpus, which leads to a small tree that often corresponds to desired concept hierarchies. The table corpus processing server employs a parallel dynamic programming approach that allows the disclosed embodiments to scale with amount of enterprise domain data being analyzed.
-
公开(公告)号:US10769140B2
公开(公告)日:2020-09-08
申请号:US14754318
申请日:2015-06-29
Applicant: Microsoft Technology Licensing, LLC
Inventor: Philip A. Bernstein , Kaushik Chakrabarti , Zhimin Chen , Yeye He , Chi Wang , Kris K. Ganjam
IPC: G06F16/245 , G06F16/901
Abstract: Concept expansion using tables, such as web tables, can return entities belonging to a concept based on an input of the concept and at least one seed entity that belongs to the concept. A concept expansion frontend can receive the concept and seed entity and provide them to a concept expansion framework. The concept expansion framework can expand the coverage of entities for concepts, including tail concepts, using tables by leveraging rich content signals corresponding to concept names. Such content signals can include content matching the concept that appear in captions, early headings, page titles, surrounding text, anchor text, and queries for which the page has been clicked. The concept expansion framework can use the structured entities in tables to infer exclusive tables. Such inference differs from previous label propagation methods and involves modeling a table-entity relationship. The table-entity relationship reduces semantic drift without using a reference ontology.
-
公开(公告)号:US10706066B2
公开(公告)日:2020-07-07
申请号:US15295858
申请日:2016-10-17
Applicant: MICROSOFT TECHNOLOGY LICENSING, LLC
Inventor: Kris Ganjam , Yeye He , Vivek Ravindranath Narasayya , Surajit Chaudhuri
Abstract: Methods, computer systems, computer-storage media, and graphical user interfaces are provided for facilitating data transformations, according to embodiments of the present invention. In one embodiment, a set of example values are received. A repository of transformation tools is searched to identify a new transformation tool as relevant to a data transformation associated with the received set of example values. The repository includes annotations associated with the new transformation tool. The new transformation tool is used to generate a transformation program that produces transformed output values. Additional annotations are generated for the new transformation tool based on the transformed output values.
-
公开(公告)号:US20190325046A1
公开(公告)日:2019-10-24
申请号:US15957378
申请日:2018-04-19
Applicant: Microsoft Technology Licensing, LLC
Inventor: Lev Novik , Surajit Chaudhuri , Yeye He
IPC: G06F17/30
Abstract: Systems, methods, and computer-executable instructions for partitioning a data set include receiving anchor attributes of a data set. The data set includes records, with each record including attributes. A set of filter attributes that are not mutually exclusive with any of the anchor attributes is determined. A set of candidate attributes that includes each unique attribute from the first data set excluding the anchor attributes and the filter attributes is determined. For each of the anchor attributes and the anchor attributes, an attribute context is determined. For each of the candidate attributes, a context similarity between each of the anchor attributes is determined. A new anchor attribute is selected from the set of candidate attributes based on the context similarity.
-
公开(公告)号:US20180081954A1
公开(公告)日:2018-03-22
申请号:US15271154
申请日:2016-09-20
Applicant: MICROSOFT TECHNOLOGY LICENSING, LLC
Inventor: Yeye He , Kris Ganjam , Vivek Ravindranath Narasayya , Surajit Chaudhuri
IPC: G06F17/30
CPC classification number: G06F16/258 , G06F16/245
Abstract: Methods, computer systems, computer-storage media, and graphical user interfaces are provided for facilitating data transformations, according to embodiments of the present invention. In one embodiment, a set of example values including example input values that indicate data values to be transformed and example output values that indicate a desired form in which to transform data. Based on the set of example values, a data transformation function that is relevant to the set of example values is identified. The data transformation function is used to generate a transformation program to transform the example input values to the desired form in which to transform data. A suggestion of the transformation program can be provided to a user device, wherein selection of the transformation program suggestion results in a data transformation.
-
公开(公告)号:US11928564B2
公开(公告)日:2024-03-12
申请号:US17969377
申请日:2022-10-19
Applicant: MICROSOFT TECHNOLOGY LICENSING, LLC
Abstract: Systems are provided for facilitating the building and use of models used to make data preparation recommendations. The systems identify ground truth from a plurality of notebooks and utilizes the ground truth to generate the corresponding data preparation recommendation models. The data preparation recommendation models are used to predict accurate (e.g., useful and relevant) data preparations steps based on user input and user notebook data. The data preparation computing system generates a recommendation prompt based on output from the data preparation recommendation model that can be viewed and/or selected by the user to be applied to the user's notebook data.
-
9.
公开(公告)号:US11698892B2
公开(公告)日:2023-07-11
申请号:US17510327
申请日:2021-10-25
Applicant: Microsoft Technology Licensing, LLC
IPC: G06F16/00 , G06F16/22 , G06F16/215 , G06N20/00 , G06F17/18
CPC classification number: G06F16/2282 , G06F16/215 , G06F17/18 , G06N20/00
Abstract: The present disclosure relates to systems, methods, and computer-readable media for using a variety of hypothesis tests to identify errors within tables and other structured datasets. For example, systems disclosed herein can generate a modified table from an input table by removing one or more entries from the input table. The systems disclosed herein can further leverage a collection of training tables to determine probabilities associated with whether the input table and modified table are drawn from the collection of training tables. The systems disclosed herein can additionally compare the probabilities to accurately determine whether the one or more entries include errors therein. The systems disclosed herein may apply to a variety of different sizes and types of tables to identify different types of common errors within input tables.
-
公开(公告)号:US10970271B2
公开(公告)日:2021-04-06
申请号:US16161695
申请日:2018-10-16
Applicant: Microsoft Technology Licensing, LLC
Inventor: Kris Kuppuswamy Ganjam , Yeye He , Anja Gruenheid
IPC: G06F16/23 , G06F16/215 , G06F16/28 , G06F16/35 , G06F16/2457
Abstract: Correcting data in a dataset. A set of data tokens from a tabular data store are grouped into a plurality of different clusters based on similarity of tokens. A reference cluster is selected from among the plurality of different clusters such that the plurality of clusters includes a reference cluster and one or more other clusters. One or more tokens in the one or more other clusters are transformed. The effect on the reference cluster of adding the transformed tokens to the reference cluster is determined. Using this information, a correction for a token in the dataset is identified. The data store is updated to correct the token using the identified correction.
-
-
-
-
-
-
-
-
-