Systems and methods for semi-supervised extraction of text classification information

    公开(公告)号:US12093646B2

    公开(公告)日:2024-09-17

    申请号:US17151088

    申请日:2021-01-15

    申请人: Recruit Co., Ltd.

    摘要: Disclosed embodiments relate to extracting classification information from input text. Techniques can include obtaining input text, identifying a plurality of tokens in the input text, pre-training a machine learning model, determining tagging information of the plurality of tokens using a first classification layer of the machine learning model, pairing sequences of tokens using the tagging information associated with the plurality of tokens, wherein the paired sequences of tokens are determined by a second classification layer, determining one or more attribute classifiers to apply to the one or more paired sequences, wherein the attribute classifiers are determined by a third classification layer of the machine learning model, evaluating sentiments of the paired sequences, wherein the sentiments of the paired sequences are determined by a fourth classification layer of the language machine learning model, aggregating sentiments of the paired sequences associated with an attribute classifier, and storing the aggregated sentiments.

    SYSTEMS AND METHODS FOR SEMI-SUPERVISED EXTRACTION OF TEXT CLASSIFICATION INFORMATION

    公开(公告)号:US20220229984A1

    公开(公告)日:2022-07-21

    申请号:US17151088

    申请日:2021-01-15

    摘要: Disclosed embodiments relate to extracting classification information from input text. Techniques can include obtaining input text, identifying a plurality of tokens in the input text, pre-training a machine learning model, determining tagging information of the plurality of tokens using a first classification layer of the machine learning model, pairing sequences of tokens using the tagging information associated with the plurality of tokens, wherein the paired sequences of tokens are determined by a second classification layer, determining one or more attribute classifiers to apply to the one or more paired sequences, wherein the attribute classifiers are determined by a third classification layer of the machine learning model, evaluating sentiments of the paired sequences, wherein the sentiments of the paired sequences are determined by a fourth classification layer of the language machine learning model, aggregating sentiments of the paired sequences associated with an attribute classifier, and storing the aggregated sentiments.

    META-LEARNING DATA AUGMENTATION FRAMEWORK

    公开(公告)号:US20220351071A1

    公开(公告)日:2022-11-03

    申请号:US17246354

    申请日:2021-04-30

    摘要: Disclosed embodiments relate to generating training data for a machine learning model. Techniques can include accessing a machine learning model from a machine learning model repository and identifying a data set associated with the machine learning model. The identified data set is utilized to generate a set of data augmentation operators. The data augmentation operators applied on a selected sequence of tokens associated with the machine learning model to generate sequences of tokens. A subset of sequences of tokens are selected and stored in a training data repository. The stored sequences of tokens are provided to the machine learning model as training data.