MACHINE LEARNING TECHNIQUES FOR ANALYZING TEXTUAL CONTENT

    公开(公告)号:US20210182496A1

    公开(公告)日:2021-06-17

    申请号:US16716402

    申请日:2019-12-16

    Abstract: Techniques are provided for using machine learning techniques to analyze textual content. In one technique, a potential item is identified within a document. An analysis of the potential item is performed at multiple levels of granularity that includes two or more of a sentence level, a segment level, or a document level. The analysis produces multiple outputs, one for each level of granularity in the multiple levels of granularity. The outputs are input into a machine-learned model to generate a score for the potential item. Based on the score, the potential item is presented on a computing device. In response to user selection of the potential item, an association between the potential item and the document is created. The association may be used later to identify a set of users to which the document (or data thereof) is to be presented.

    TECHNIQUES FOR IMPROVING STANDARDIZED DATA ACCURACY

    公开(公告)号:US20220391690A1

    公开(公告)日:2022-12-08

    申请号:US17340607

    申请日:2021-06-07

    Abstract: Described herein is a technique for mapping the raw text of a job title of an online job posting to an entity embedding, associated with an entity or entry of a title taxonomy. The raw text of the job title is first encoded to generate a multilingual word embedding in a multilingual word embedding space. Then, the vector representation of the job title, as represented in the multilingual word embedding space is translated, using a neural network, to a vector representation of the job title in the entity embedding space. Finally, a nearest neighbor search is performed to identify an entity embedding associated with an entity or entry in the title taxonomy that has a vector representation that is closest in distance to the vector output by the neural network.

    Semantic matching and retrieval of standardized entities

    公开(公告)号:US11481448B2

    公开(公告)日:2022-10-25

    申请号:US16836546

    申请日:2020-03-31

    Abstract: During operation, the system obtains a first embedding produced by an embedding model from an input string representing an entity and a hierarchy of clusters of embeddings generated by the embedding model from a set of standardized entities. Next, the system searches the hierarchy of clusters for a subset of the embeddings that are within a threshold proximity to the first embedding in a vector space. The system then calculates embedding match scores between the input string and a first subset of the standardized entities represented by the subset of the embeddings based on distances between the subset of the embeddings and the first embedding in the vector space. Finally, the system modifies, based on the embedding match scores, content outputted in response to the input string within a user interface of an online system.

    FILTERING RECOMMENDATIONS
    5.
    发明申请

    公开(公告)号:US20210012267A1

    公开(公告)日:2021-01-14

    申请号:US16505306

    申请日:2019-07-08

    Abstract: The disclosed embodiments provide a system for processing data. During operation, the system obtains a set of rules for filtering job recommendations, wherein the rules are selected to maximize a reduction in negative outcomes associated with the job recommendations. Next, the system generates a label for a set of candidate-job pairs that match one or more of the rules and inputs the label with a set of candidate-job features for the set of candidate-job pairs as training data for a filtering model. The system then applies the filtering model to additional candidate-job features associated with a candidate and a set of jobs to produce a set of scores, wherein each score represents a likelihood that the candidate perceives a corresponding job as an undesirable recommendation. Finally, the system outputs a subset of the jobs as recommendations to the candidate based on the set of scores.

    Techniques for improving standardized data accuracy

    公开(公告)号:US12229669B2

    公开(公告)日:2025-02-18

    申请号:US17340607

    申请日:2021-06-07

    Abstract: Described herein is a technique for mapping the raw text of a job title of an online job posting to an entity embedding, associated with an entity or entry of a title taxonomy. The raw text of the job title is first encoded to generate a multilingual word embedding in a multilingual word embedding space. Then, the vector representation of the job title, as represented in the multilingual word embedding space is translated, using a neural network, to a vector representation of the job title in the entity embedding space. Finally, a nearest neighbor search is performed to identify an entity embedding associated with an entity or entry in the title taxonomy that has a vector representation that is closest in distance to the vector output by the neural network.

    Machine learning techniques for analyzing textual content

    公开(公告)号:US11487947B2

    公开(公告)日:2022-11-01

    申请号:US16716402

    申请日:2019-12-16

    Abstract: Techniques are provided for using machine learning techniques to analyze textual content. In one technique, a potential item is identified within a document. An analysis of the potential item is performed at multiple levels of granularity that includes two or more of a sentence level, a segment level, or a document level. The analysis produces multiple outputs, one for each level of granularity in the multiple levels of granularity. The outputs are input into a machine-learned model to generate a score for the potential item. Based on the score, the potential item is presented on a computing device. In response to user selection of the potential item, an association between the potential item and the document is created. The association may be used later to identify a set of users to which the document (or data thereof) is to be presented.

    SEMANTIC MATCHING AND RETRIEVAL OF STANDARDIZED ENTITIES

    公开(公告)号:US20210303638A1

    公开(公告)日:2021-09-30

    申请号:US16836546

    申请日:2020-03-31

    Abstract: The disclosed embodiments provide a system for processing user-generated input. During operation, the system obtains a first embedding produced by an embedding model from an input string representing an entity and a hierarchy of clusters of embeddings generated by the embedding model from a set of standardized entities. Next, the system searches the hierarchy of clusters for a subset of the embeddings that are within a threshold proximity to the first embedding in a vector space. The system then calculates embedding match scores between the input string and a first subset of the standardized entities represented by the subset of the embeddings based on distances between the subset of the embeddings and the first embedding in the vector space. Finally, the system modifies, based on the embedding match scores, content outputted in response to the input string within a user interface of an online system.

Patent Agency Ranking