-
公开(公告)号:US20200210389A1
公开(公告)日:2020-07-02
申请号:US16235441
申请日:2018-12-28
Applicant: Microsoft Technology Licensing, LLC
Inventor: Arun Narasimha Swami , Sriram Vasudevan
IPC: G06F16/215 , G06F16/23 , G06F16/2455 , G06F16/2458 , G06F16/335
Abstract: The disclosed embodiments provide a system for performing profile-driven data validation. During operation, the system obtains a validation configuration containing declarative specifications of fields in a data set and validation rules to be applied to the data set. Next, the system analyzes the data set based on the validation configuration to produce a set of metrics related to the data set and stores the metrics in a profile for the data set. The system also matches a metric in the profile to the type of validation associated with a validation rule in the validation configuration. Finally, the system applies the validation rule to a value of the metric in the profile to produce a validation result for the validation rule.
-
2.
公开(公告)号:US20220358398A1
公开(公告)日:2022-11-10
申请号:US17313560
申请日:2021-05-06
Applicant: Microsoft Technology Licensing, LLC
Inventor: Meng MENG , Daniel Sairom Krishnan Hewlett , Sriram Vasudevan , Vitaly Abdrashitov
IPC: G06N20/00 , G06K9/62 , G06F16/951 , G06F16/955 , H04L29/08
Abstract: Techniques for incorporating sequence encoders into machine-learned models where the sequence encoders operate on bag of words (BOW) input are provided. Tokens that are associated with online activities of an entity are identified. Machine-learned embeddings that correspond to the tokens are identified. Based on one or more ordering criteria that are independent of the temporal occurrence of the online activities of the entity, an order of the machine-learned embeddings is determined. Based on the order, the machine-learned embeddings are inputted to a sequence encoder that generates output. Based on the output, a machine learned model that includes the sequence encoder generates a score. A content item is selected based on the score. The content item is transmitted over a computer network to a computing device.
-
公开(公告)号:US20200210401A1
公开(公告)日:2020-07-02
申请号:US16235347
申请日:2018-12-28
Applicant: Microsoft Technology Licensing, LLC
Inventor: Arun Narasimha Swami , Sriram Vasudevan
IPC: G06F16/23
Abstract: The disclosed embodiments provide a system for processing data. During operation, the system obtains a validation configuration containing declarative specifications of fields in a data set and validation rules to be applied to the data set, wherein the validation rules include a field in the data set, a type of validation to be applied to the field, and a parameter for managing a validation failure during evaluation of the validation rules with the data set. Next, the system automatically applies the validation rules to the data set within a workflow for generating the data set to produce validation results indicating passing or failing of the validation rules by the data set. The system then outputs the validation results for use in managing the data set.
-
4.
公开(公告)号:US12190209B2
公开(公告)日:2025-01-07
申请号:US17313560
申请日:2021-05-06
Applicant: Microsoft Technology Licensing, LLC
Inventor: Meng Meng , Daniel Sairom Krishnan Hewlett , Sriram Vasudevan , Vitaly Abdrashitov
IPC: G06N20/00 , G06F16/951 , G06F16/955 , G06F18/2113 , H04L67/02
Abstract: Techniques for incorporating sequence encoders into machine-learned models where the sequence encoders operate on bag of words (BOW) input are provided. Tokens that are associated with online activities of an entity are identified. Machine-learned embeddings that correspond to the tokens are identified. Based on one or more ordering criteria that are independent of the temporal occurrence of the online activities of the entity, an order of the machine-learned embeddings is determined. Based on the order, the machine-learned embeddings are inputted to a sequence encoder that generates output. Based on the output, a machine learned model that includes the sequence encoder generates a score. A content item is selected based on the score. The content item is transmitted over a computer network to a computing device.
-
公开(公告)号:US20220284028A1
公开(公告)日:2022-09-08
申请号:US17195261
申请日:2021-03-08
Applicant: Microsoft Technology Licensing, LLC
Inventor: Meng Meng , Daniel Sairom Krishnan Hewlett , Sriram Vasudevan
IPC: G06F16/2457 , G06Q10/10 , G06N3/08
Abstract: Described herein is machine learning model comprising a neural network that is trained to generate a ranking score for an online job posting. The neural network takes as input a variety of input features, including at least a first input feature that is an encoded representation of a search query as generated by a first Transformer encoder, an encoded representation of a job title as generated by a second Transformer encoder, and an encoded representation of a company name as generated by a third Transformer encoder. Once a plurality of online job postings are ranked, some subset of the plurality are presented in a user interface, ordered based on their respective ranking scores.
-
公开(公告)号:US20230418841A1
公开(公告)日:2023-12-28
申请号:US17847755
申请日:2022-06-23
Applicant: Microsoft Technology Licensing, LLC
Inventor: Sriram Vasudevan
IPC: G06F16/28 , G06F16/2455 , G06F16/2457 , G06N20/00
CPC classification number: G06F16/285 , G06F16/24556 , G06F16/24564 , G06F16/24578 , G06N20/00
Abstract: Methods, systems, and computer programs are presented for labeling datasets. An example method can include generating rules for labeling data records within a first dataset. The rules can indicate an extent to which a data record matches query criteria. The method can further include generating an aggregated label for the corresponding data record based on the rules and training a machine learning model using the first dataset and the aggregated label. The method can include receiving an indication of user engagement and combining the indication of user engagement with the aggregated label to generate a score.
-
-
-
-
-