Cross-lingual regularization for multilingual generalization

    公开(公告)号:US11829727B2

    公开(公告)日:2023-11-28

    申请号:US17239297

    申请日:2021-04-23

    CPC classification number: G06F40/58 G06F40/51 G06N3/08 G06N20/00

    Abstract: Approaches for cross-lingual regularization for multilingual generalization include a method for training a natural language processing (NLP) deep learning module. The method includes accessing a first dataset having a first training data entry, the first training data entry including one or more natural language input text strings in a first language; translating at least one of the one or more natural language input text strings of the first training data entry from the first language to a second language; creating a second training data entry by starting with the first training data entry and substituting the at least one of the natural language input text strings in the first language with the translation of the at least one of the natural language input text strings in the second language; adding the second training data entry to a second dataset; and training the deep learning module using the second dataset.

    PARAMETER UTILIZATION FOR LANGUAGE PRE-TRAINING

    公开(公告)号:US20220391640A1

    公开(公告)日:2022-12-08

    申请号:US17532851

    申请日:2021-11-22

    Abstract: Embodiments are directed to pre-training a transformer model using more parameters for sophisticated patterns (PSP++). The transformer model is divided into a held-out model and a main model. A forward pass and a backward pass are performed on the held-out model, where the forward pass determines self-attention hidden states of the held-out model and the backward pass determines loss of the held-out model. A forward pass on the main model is performed to determine a self-attention hidden states of the main model. The self-attention hidden states of the main model are concatenated with the self-attention hidden states of the held-out model. A backward pass is performed on the main model to determine a loss of the main model. The parameters of the held-out model are updated to reflect the loss of the held-out model and parameters of the main model are updated to reflect the loss of the main model.

    Cross-Lingual Regularization for Multilingual Generalization

    公开(公告)号:US20200285706A1

    公开(公告)日:2020-09-10

    申请号:US16399429

    申请日:2019-04-30

    Abstract: Approaches for cross-lingual regularization for multilingual generalization include a method for training a natural language processing (NLP) deep learning module. The method includes accessing a first dataset having a first training data entry, the first training data entry including one or more natural language input text strings in a first language; translating at least one of the one or more natural language input text strings of the first training data entry from the first language to a second language; creating a second training data entry by starting with the first training data entry and substituting the at least one of the natural language input text strings in the first language with the translation of the at least one of the natural language input text strings in the second language; adding the second training data entry to a second dataset; and training the deep learning module using the second dataset.

    SEQUENCE-TO-SEQUENCE PREDICTION USING A NEURAL NETWORK MODEL

    公开(公告)号:US20190130273A1

    公开(公告)日:2019-05-02

    申请号:US15884125

    申请日:2018-01-30

    Abstract: A method for sequence-to-sequence prediction using a neural network model includes generating an encoded representation based on an input sequence using an encoder of the neural network model and predicting an output sequence based on the encoded representation using a decoder of the neural network model. The neural network model includes a plurality of model parameters learned according to a machine learning process. At least one of the encoder or the decoder includes a branched attention layer. Each branch of the branched attention layer includes an interdependent scaling node configured to scale an intermediate representation of the branch by a learned scaling parameter. The learned scaling parameter depends on one or more other learned scaling parameters of one or more other interdependent scaling nodes of one or more other branches of the branched attention layer.

Patent Agency Ranking