KEYWORD DATA AUGMENTATION TOOL FOR NATURAL LANGUAGE PROCESSING

    公开(公告)号:US20220171930A1

    公开(公告)日:2022-06-02

    申请号:US17452742

    申请日:2021-10-28

    Abstract: Techniques for keyword data augmentation for training chatbot systems in natural language processing. In one particular aspect, a method is provided that includes receiving a training set of utterances for training a machine-learning model to identify one or more intents for one or more utterances, augmenting the training set of utterances with out-of-domain (OOD) examples. The augmenting includes: identifying keywords within utterances of the training set of utterances, generating a set of OOD examples with the identified keywords, filtering out OOD examples from the set of OOD examples that have a context substantially similar to context of the utterances of the training set of utterances, and incorporating the set of OOD examples without the filtered OOD examples into the training set of utterances to generate an augmented training set of utterances. Thereafter, the machine-learning model is trained using the augmented training set of utterances.

    USING A GENERATIVE ADVERSARIAL NETWORK TO TRAIN A SEMANTIC PARSER OF A DIALOG SYSTEM

    公开(公告)号:US20210074274A1

    公开(公告)日:2021-03-11

    申请号:US16992306

    申请日:2020-08-13

    Abstract: Disclosed herein are techniques for using a generative adversarial network (GAN) to train a semantic parser of a dialog system. A method described herein involves accessing seed data that includes seed tuples. Each seed tuple includes a respective seed utterance and a respective seed logical form corresponding to the respective seed utterance. The method further includes training a semantic parser and a discriminator in a GAN. The semantic parser learns to map utterances to logical forms based on output from the discriminator, and the discriminator learns to recognize authentic logical forms based on output from the semantic parser. The semantic parser may then be integrated into a dialog system.

    USING BACKPROPAGATION TO TRAIN A DIALOG SYSTEM

    公开(公告)号:US20210074269A1

    公开(公告)日:2021-03-11

    申请号:US17002229

    申请日:2020-08-25

    Abstract: Techniques described herein use backpropagation to train one or more machine learning (ML) models of a dialog system. For instance, a method includes accessing seed data that includes training tuples, where each training tuple comprising a respective logical form. The method includes converting the logical form of a training tuple to a converted logical form, by applying to the logical form a text-to-speech (TTS) subsystem, an automatic speech recognition (ASR) subsystem, and a semantic parser of a dialog system. The method includes determining a training signal by using an objective function to compare the converted logical form to the logical form. The method further includes training the TTS subsystem, the ASR subsystem, and the semantic parser via backpropagation based on the training signal. As a result of the training by backpropagation, the machine learning models are tuned work effectively together within a pipeline of the dialog system.

    IMPLEMENTING A CORRECTION MODEL TO REDUCE PROPAGATION OF AUTOMATIC SPEECH RECOGNITION ERRORS

    公开(公告)号:US20210074262A1

    公开(公告)日:2021-03-11

    申请号:US16992291

    申请日:2020-08-13

    Abstract: Some techniques described herein determine a correction model for a dialog system, such that the correction model corrects output from an automatic speech recognition (ASR) subsystem in the dialog system. A method described herein includes accessing training data. A first tuple of the training data includes an utterance, where the utterance is a textual representation of speech. The method further includes using an ASR subsystem of a dialog system to convert the utterance to an output utterance. The method further includes storing the output utterance in corrective training data that is based on the training data. The method further includes training a correction model based on the corrective training data, such that the correction model is configured to correct output from the ASR subsystem during operation of the dialog system.

    Techniques for out-of-domain (OOD) detection

    公开(公告)号:US12299402B2

    公开(公告)日:2025-05-13

    申请号:US18659606

    申请日:2024-05-09

    Abstract: The present disclosure relates to techniques for identifying out-of-domain utterances. One particular technique includes receiving an utterance and a target domain of a chatbot, generating a sentence embedding for the utterance, obtaining an embedding representation for each cluster of in-domain utterances associated with the target domain, predicting, using a metric learning model, a first probability that the utterance belongs to the target domain based on a similarity or difference between the sentence embedding and each embedding representation for each cluster, predicting, using an outlier detection model, a second probability that the utterance belongs to the target domain based on a determined distance or density deviation between the sentence embedding and embedding representations for neighboring clusters, evaluating the first probability and the second probability to determine a final probability, and classifying the utterance as in-domain or out-of-domain for the chatbot based on the final probability.

Patent Agency Ranking