Language agnostic machine learning model for title standardization

    公开(公告)号:US11610109B2

    公开(公告)日:2023-03-21

    申请号:US16142441

    申请日:2018-09-26

    摘要: In an example embodiment, a system is provided whereby a machine learning model is trained to predict a standardization for a given raw title. A neural network may be trained whose input is a raw title (such as a query string) and a list of candidate titles (either title identifications in a taxonomy, or English strings), which produces a probability that the raw title and each candidate belong to the same title. The model is able to standardize titles in any language included in the training data without first having to perform language identification or normalization of the title. Additionally, the model is able to benefit from the existence of “loan words” (words adopted from a foreign language with little or no modification) and relations between languages.

    Identifying duplicate entities
    2.
    发明授权

    公开(公告)号:US11436532B2

    公开(公告)日:2022-09-06

    申请号:US16703386

    申请日:2019-12-04

    摘要: The disclosed embodiments provide a system that identifies duplicate entities. During operation, the system selects training data for a first machine learning model based on confidence scores representing likelihoods that pairs of entities in an online system are duplicates. Next, the system updates parameters of the first machine learning model based on features and labels in the training data. The system then identifies a first subset of additional pairs of the entities as duplicate entities based on scores generated by the first machine learning model from values of the features for the additional pairs and a first threshold associated with the scores. The system also determines a canonical entity in each of the duplicate entities based on additional features. Finally, the system updates content outputted in a user interface of the online system based on the identified first subset of the additional pairs.

    IDENTIFYING DUPLICATE ENTITIES
    3.
    发明申请

    公开(公告)号:US20210173825A1

    公开(公告)日:2021-06-10

    申请号:US16703386

    申请日:2019-12-04

    IPC分类号: G06F16/23 G06N20/00

    摘要: The disclosed embodiments provide a system that identifies duplicate entities. During operation, the system selects training data for a first machine learning model based on confidence scores representing likelihoods that pairs of entities in an online system are duplicates. Next, the system updates parameters of the first machine learning model based on features and labels in the training data. The system then identifies a first subset of additional pairs of the entities as duplicate entities based on scores generated by the first machine learning model from values of the features for the additional pairs and a first threshold associated with the scores. The system also determines a canonical entity in each of the duplicate entities based on additional features. Finally, the system updates content outputted in a user interface of the online system based on the identified first subset of the additional pairs.

    Multi-dimensional job title logical models for social network members

    公开(公告)号:US10339612B2

    公开(公告)日:2019-07-02

    申请号:US15195562

    申请日:2016-06-28

    IPC分类号: G06F17/30 G06Q50/00

    摘要: An online social networking system extracts terms from an unstructured job title record. The system searches a job role taxonomy database with the extracted terms to identify job roles. For each job role identified, the system extracts a plurality of additional terms appearing in the unstructured job title record. For each additional term, the system maps the additional term to a standardized modifier, thereby identifying a job seniority modifier, a job specialty modifier, a job accreditation modifier, and a job status modifier for each additional term. The system creates a multi-dimensional standardized job title for the member profile or job posting by writing the job role, the job seniority modifier, the job specialty modifier, the job accreditation modifier, and the job status modifier to a standardization record in a standardization database.

    Training a neural network using another neural network

    公开(公告)号:US11188823B2

    公开(公告)日:2021-11-30

    申请号:US15168750

    申请日:2016-05-31

    IPC分类号: G06N3/08 G06N3/04

    摘要: In an example embodiment, a first DCNN is trained to output a value for a first metric by inputting a plurality of sample documents to the first DCNN, with each of the sample documents having been labeled with a value for the first metric. Then a plurality of possible transformations of a first input document are fed to the first DCNN, obtaining a value for the first metric for each of the plurality of possible transformations. A first transformation is selected from the plurality of possible transformations based on the values for the first metric for each of the plurality of possible transformations. Then a second DCNN is trained to output a transformation for a document by inputting the selected first transformation to the second DCNN. The second input document is fed to the second DCNN, obtaining a second transformation of the second input document.

    AUTOMATICALLY IDENTIFYING ADDITIONAL ENTITIES FOR CONTENT DELIVERY

    公开(公告)号:US20200160398A1

    公开(公告)日:2020-05-21

    申请号:US16192565

    申请日:2018-11-15

    IPC分类号: G06Q30/02

    摘要: Technologies for associating an entity with a content delivery campaign are provided. Disclosed techniques include determining a first value of a profile attribute of the entity. A particular node that matches the first value is identified from a value tree of nodes. A parent node of the particular node is identified from the value tree. Child nodes of the parent node are identified, where the child nodes do not include the particular node. Values from the child nodes are then associated with the profile attribute of the entity. A particular value is received for a particular targeting criterion of the content delivery campaign. It is determined whether the particular value matches a value of the child nodes, where the particular value does not match the first value. In response to determining that the particular value matches a value of the child nodes, associating the entity with the content delivery campaign.

    TITLE STANDARDIZATION THROUGH ITERATIVE PROCESSING

    公开(公告)号:US20190205376A1

    公开(公告)日:2019-07-04

    申请号:US15885004

    申请日:2018-01-31

    IPC分类号: G06F17/27

    摘要: Example methods and systems are directed to determining a standardized job title corresponding to an input job title. The input job title may be normalized according to various normalization rules to produce a normalized input job title. The normalized input job title may then be tokenized into one or more n-grams, and synonyms may be identified from the various n-grams. A title taxonomy may then be searched using the normalized input job title, the tokenized n-grams, and the identified synonyms, where the search results correspond to standardized job titles that match the various inputs. Each of the candidate job titles may then be scored using congruence type features and information quality features. The highest scoring candidate job title is then selected as the standardized job title for the input job title. An association is then established between the standardized job title and the input job title.

    Deriving multi-level seniority of social network members

    公开(公告)号:US10255586B2

    公开(公告)日:2019-04-09

    申请号:US15199423

    申请日:2016-06-30

    摘要: An online social networking system receives an unstructured job title record from a profile of a member or a job posting. The system extracts a raw job title from the unstructured job title record, and extracts a first seniority level from the raw job title. The first seniority level is a seniority modifier associated with the raw job title. The system determines a second seniority level. The second seniority level is a company seniority within the company associated with the unstructured job title record. The system determines a third seniority level. The third seniority level is a seniority score for the member or the job posting. The system compares the seniority score with a second seniority score, and communicates with the member, or transmits the job posting to the member, based on the comparison of the seniority score and the second seniority score.

    TECHNIQUES FOR IMPROVING STANDARDIZED DATA ACCURACY

    公开(公告)号:US20220391690A1

    公开(公告)日:2022-12-08

    申请号:US17340607

    申请日:2021-06-07

    摘要: Described herein is a technique for mapping the raw text of a job title of an online job posting to an entity embedding, associated with an entity or entry of a title taxonomy. The raw text of the job title is first encoded to generate a multilingual word embedding in a multilingual word embedding space. Then, the vector representation of the job title, as represented in the multilingual word embedding space is translated, using a neural network, to a vector representation of the job title in the entity embedding space. Finally, a nearest neighbor search is performed to identify an entity embedding associated with an entity or entry in the title taxonomy that has a vector representation that is closest in distance to the vector output by the neural network.

    Inferring appropriate courses for recommendation based on member characteristics

    公开(公告)号:US11188992B2

    公开(公告)日:2021-11-30

    申请号:US15366728

    申请日:2016-12-01

    IPC分类号: G06Q10/00 G06Q50/20 G06Q50/00

    摘要: A system and method for inferring appropriate courses for recommendation based on member characteristics is disclosed. A social networking system receives a request for recommended courses, wherein the request is associated with a member of the social networking system. The social networking system identifies a group of members who are similar to the first member. The social networking system creates a list of recently learned skills by members of the group of members similar to the member. For a particular skill in the list of skills, the social networking system determines whether the member possesses the particular skill. In accordance with a determination that the member does not possess the particular skill, the social networking system identifies at least one course that teaches the particular skill from a list of courses. The social networking system transmits the identified course to the client device for display as a recommended course.