Language agnostic machine learning model for title standardization

    公开(公告)号:US11610109B2

    公开(公告)日:2023-03-21

    申请号:US16142441

    申请日:2018-09-26

    摘要: In an example embodiment, a system is provided whereby a machine learning model is trained to predict a standardization for a given raw title. A neural network may be trained whose input is a raw title (such as a query string) and a list of candidate titles (either title identifications in a taxonomy, or English strings), which produces a probability that the raw title and each candidate belong to the same title. The model is able to standardize titles in any language included in the training data without first having to perform language identification or normalization of the title. Additionally, the model is able to benefit from the existence of “loan words” (words adopted from a foreign language with little or no modification) and relations between languages.

    Language Agnostic Machine Learning Model for Title Standardization

    公开(公告)号:US20200097812A1

    公开(公告)日:2020-03-26

    申请号:US16142441

    申请日:2018-09-26

    IPC分类号: G06N3/08 G06Q10/06 G06F17/30

    摘要: In an example embodiment, a system is provided whereby a machine learning model is trained to predict a standardization for a given raw title. A neural network may be trained whose input is a raw title (such as a query string) and a list of candidate titles (either title identifications in a taxonomy, or English strings), which produces a probability that the raw title and each candidate belong to the same title. The model is able to standardize titles in any language included in the training data without first having to perform language identification or normalization of the title. Additionally, the model is able to benefit from the existence of “loan words” (words adopted from a foreign language with little or no modification) and relations between languages.

    MACHINE LEARNING MODEL FOR SPECIALTY KNOWLEDGE BASE

    公开(公告)号:US20230077840A1

    公开(公告)日:2023-03-16

    申请号:US17477302

    申请日:2021-09-16

    IPC分类号: G06N5/02 G06F16/901 G06Q10/06

    摘要: Techniques for predicting specialty data for a knowledge base using a machine learning model are disclosed herein. In some embodiments, a computer-implemented method comprises: for each skill in a plurality of skills, computing a skill-to-specialty distribution for specialties using a first machine learning model; for each skill in the plurality of skills, computing a user-to-skill distribution for the plurality of skills based on feature data of a first user of an online service using a second machine learning model; computing a user-to-specialty distribution for the plurality of specialties based on the skill-to-specialty distribution and the user-to-skill distribution, the user-to-specialty distribution comprising a corresponding user-to-specialty probability value for each specialty in the plurality of specialties given the first user; and using the user-to-specialty distribution in an application of the online service.