Techniques for improving standardized data accuracy

    公开(公告)号:US12229669B2

    公开(公告)日:2025-02-18

    申请号:US17340607

    申请日:2021-06-07

    Abstract: Described herein is a technique for mapping the raw text of a job title of an online job posting to an entity embedding, associated with an entity or entry of a title taxonomy. The raw text of the job title is first encoded to generate a multilingual word embedding in a multilingual word embedding space. Then, the vector representation of the job title, as represented in the multilingual word embedding space is translated, using a neural network, to a vector representation of the job title in the entity embedding space. Finally, a nearest neighbor search is performed to identify an entity embedding associated with an entity or entry in the title taxonomy that has a vector representation that is closest in distance to the vector output by the neural network.

    Machine learning techniques for analyzing textual content

    公开(公告)号:US11487947B2

    公开(公告)日:2022-11-01

    申请号:US16716402

    申请日:2019-12-16

    Abstract: Techniques are provided for using machine learning techniques to analyze textual content. In one technique, a potential item is identified within a document. An analysis of the potential item is performed at multiple levels of granularity that includes two or more of a sentence level, a segment level, or a document level. The analysis produces multiple outputs, one for each level of granularity in the multiple levels of granularity. The outputs are input into a machine-learned model to generate a score for the potential item. Based on the score, the potential item is presented on a computing device. In response to user selection of the potential item, an association between the potential item and the document is created. The association may be used later to identify a set of users to which the document (or data thereof) is to be presented.

    Monitoring and correction system for improved laser display systems

    公开(公告)号:US11233980B2

    公开(公告)日:2022-01-25

    申请号:US16440597

    申请日:2019-06-13

    Abstract: Techniques for improving laser image quality are disclosed herein. An ultra-compact illumination module includes multiple illuminators, photodetectors, and color filters. The illuminators each emit a different spectrum of light. Because of the compact nature of the module and the positioning of the illuminators relative to one another, the different spectrums of light overlap one another prior to being detected by the photodetectors. Each of the photodetectors is associated with a corresponding one of the illuminators, and each of the color filters is associated with a corresponding one of the photodetectors. Each color filter is positioned in-between its corresponding illuminator and photodetector and passes a particular spectrum of light while filtering out other spectrums of light. Consequently, the photodetectors each receive spectrally filtered light having passed through at least one of the color filters. The power output of the illuminators can also be corrected based on output from the photodetectors.

    SEMANTIC MATCHING AND RETRIEVAL OF STANDARDIZED ENTITIES

    公开(公告)号:US20210303638A1

    公开(公告)日:2021-09-30

    申请号:US16836546

    申请日:2020-03-31

    Abstract: The disclosed embodiments provide a system for processing user-generated input. During operation, the system obtains a first embedding produced by an embedding model from an input string representing an entity and a hierarchy of clusters of embeddings generated by the embedding model from a set of standardized entities. Next, the system searches the hierarchy of clusters for a subset of the embeddings that are within a threshold proximity to the first embedding in a vector space. The system then calculates embedding match scores between the input string and a first subset of the standardized entities represented by the subset of the embeddings based on distances between the subset of the embeddings and the first embedding in the vector space. Finally, the system modifies, based on the embedding match scores, content outputted in response to the input string within a user interface of an online system.

    FLEXIBLE CONFIGURATION OF MODEL TRAINING PIPELINES

    公开(公告)号:US20190228343A1

    公开(公告)日:2019-07-25

    申请号:US15878186

    申请日:2018-01-23

    Abstract: The disclosed embodiments provide a system for processing data. During operation, the system obtains a model definition and a training configuration for a machine-learning model, wherein the training configuration includes a set of required features, a training technique, and a scoring function. Next, the system uses the model definition and the training configuration to load the machine-learning model and the set of required features into a training pipeline without requiring a user to manually identify the set of required features. The system then uses the training pipeline and the training configuration to update a set of parameters for the machine-learning model. Finally, the system stores mappings containing the updated set of parameters and the set of required features in a representation of the machine-learning model.

    TECHNIQUES FOR IMPROVING STANDARDIZED DATA ACCURACY

    公开(公告)号:US20220391690A1

    公开(公告)日:2022-12-08

    申请号:US17340607

    申请日:2021-06-07

    Abstract: Described herein is a technique for mapping the raw text of a job title of an online job posting to an entity embedding, associated with an entity or entry of a title taxonomy. The raw text of the job title is first encoded to generate a multilingual word embedding in a multilingual word embedding space. Then, the vector representation of the job title, as represented in the multilingual word embedding space is translated, using a neural network, to a vector representation of the job title in the entity embedding space. Finally, a nearest neighbor search is performed to identify an entity embedding associated with an entity or entry in the title taxonomy that has a vector representation that is closest in distance to the vector output by the neural network.

    Semantic matching and retrieval of standardized entities

    公开(公告)号:US11481448B2

    公开(公告)日:2022-10-25

    申请号:US16836546

    申请日:2020-03-31

    Abstract: During operation, the system obtains a first embedding produced by an embedding model from an input string representing an entity and a hierarchy of clusters of embeddings generated by the embedding model from a set of standardized entities. Next, the system searches the hierarchy of clusters for a subset of the embeddings that are within a threshold proximity to the first embedding in a vector space. The system then calculates embedding match scores between the input string and a first subset of the standardized entities represented by the subset of the embeddings based on distances between the subset of the embeddings and the first embedding in the vector space. Finally, the system modifies, based on the embedding match scores, content outputted in response to the input string within a user interface of an online system.

    OPTIMIZING FEATURE EVALUATION IN MACHINE LEARNING

    公开(公告)号:US20190325352A1

    公开(公告)日:2019-10-24

    申请号:US15959023

    申请日:2018-04-20

    Abstract: The disclosed embodiments provide a system for processing data. During operation, the system obtains a feature dependency graph of features for a machine learning model and an operator dependency graph comprising operators to be applied to the features. Next, the system generates feature values of the features according to an evaluation order associated with the operator dependency graph and feature dependencies from the feature dependency graph. During evaluation of an operator in the evaluation order, the system updates a list of calculated features with one or more features that have been calculated for use with the operator. During evaluation of a subsequent operator in the evaluation order, the system uses the list of calculated features to omit recalculation of the feature(s) for use with the subsequent operator.

Patent Agency Ranking