DISTRIBUTED TRAINING OF NEURAL NETWORK MODELS

    公开(公告)号:US20210182660A1

    公开(公告)日:2021-06-17

    申请号:US16716461

    申请日:2019-12-16

    Abstract: Systems and methods for distributed training of a neural network model are described. Various embodiments include a master device and a slave device. The master device has a first version of the neural network model. The slave device is communicatively coupled to a first data source and the master device, and the first data source is inaccessible by the master device, in accordance with one embodiment. The slave device is remote from the master device. The master device is configured to output first configuration data for the neural network model based on the first version of the neural network model. The slave device is configured to use the first configuration data to instantiate a second version of the neural network model. The slave device is configured to train the second version of the neural network model using data from the first data source and to output second configuration data for the neural network model. The master device is configured to use the second configuration data to update parameters for the first version of the neural network model.

    Custom acoustic models
    32.
    发明授权

    公开(公告)号:US11011162B2

    公开(公告)日:2021-05-18

    申请号:US15996393

    申请日:2018-06-01

    Abstract: The technology disclosed relates to performing speech recognition for a plurality of different devices or devices in a plurality of conditions. This includes storing a plurality of acoustic models associated with different devices or device conditions, receiving speech audio including natural language utterances, receiving metadata indicative of a device type or device condition, selecting an acoustic model from the plurality in dependence upon the received metadata, and employing the selected acoustic model to recognize speech from the natural language utterances included in the received speech audio. Each of speech recognition and the storage of acoustic models can be performed locally by devices or on a network-connected server. Also provided is a platform and interface, used by device developers to select, configure, and/or train acoustic models for particular devices and/or conditions.

    System and Method for Voice Morphing

    公开(公告)号:US20210089626A1

    公开(公告)日:2021-03-25

    申请号:US16578386

    申请日:2019-09-22

    Inventor: Dylan H. Ross

    Abstract: A system and method for masking an identity of a speaker of natural language speech, such as speech clips to be labeled by humans in a system generating voice transcriptions for training an automatic speech recognition model. The natural language speech is morphed prior to being presented to the human for labeling. In one embodiment, morphing comprises pitch shifting the speech randomly either up or down, then frequency shifting the speech, then pitch shifting the speech in a direction opposite the first pitch shift.

    DYNAMIC INTERPOLATION FOR HYBRID LANGUAGE MODELS

    公开(公告)号:US20210035569A1

    公开(公告)日:2021-02-04

    申请号:US16529730

    申请日:2019-08-01

    Abstract: In order to improve the accuracy of ASR, an utterance is transcribed using a plurality of language models, such as for example, an N-gram language model and a neural language model. The language models are trained separately. They each output a probability score or other figure of merit for a partial transcription hypothesis. Model scores are interpolated to determine a hybrid score. While recognizing an utterance, interpolation weights are chosen or updated dynamically, in the specific context of processing. The weights are based on dynamic variables associated with the utterance, the partial transcription hypothesis, or other aspects of context.

    Unified embeddings for translation
    36.
    发明授权

    公开(公告)号:US10796107B2

    公开(公告)日:2020-10-06

    申请号:US16232984

    申请日:2018-12-26

    Inventor: Terry Kong

    Abstract: A method of training word embeddings is provided. The method includes determining anchors, each comprising a first word in a first domain and a second word in a second domain, training word embeddings for the first and second domains, and training a transform for transforming word embedding vectors in the first domain to word embedding vectors in the second domain, wherein the training minimizes a loss function that includes an anchor loss for each anchor, such that for each anchor, the anchor loss is based on a distance between the anchor's second word's embedding vector and the transform of the anchor's first word's embedding vector, and for each anchor, the anchor loss for the respective anchor is zero when the distance between the respective anchor's second word's embedding vector and the transform of the respective anchor's first word's embedding vector is less than a specific tolerance.

    System and methods for a virtual assistant to manage and use context in a natural language dialog

    公开(公告)号:US10418032B1

    公开(公告)日:2019-09-17

    申请号:US15163485

    申请日:2016-05-24

    Abstract: A dialog with a conversational virtual assistant includes a sequence of user queries and systems responses. Queries are received and interpreted by a natural language understanding system. Dialog context information gathered from user queries and system responses is stored in a layered context data structure. Incomplete queries, which do not have sufficient information to result in an actionable interpretation, become actionable with use of context data. The system recognizes the need to access context data, and retrieves from context layers information required to transform the query into an executable one. The system may then act on the query and provide an appropriate response to the user. Context data buffers forget information, perhaps selectively, with the passage of time, and after a sufficient number and type of intervening queries.

    USER-PROVIDED TRANSCRIPTION FEEDBACK AND CORRECTION

    公开(公告)号:US20190035385A1

    公开(公告)日:2019-01-31

    申请号:US16147889

    申请日:2018-10-01

    Abstract: A system, method, and non-transitory computer readable medium provide for a visual display of a user interface for a voice-based virtual assistant system. After displaying a transcription of user speech and performing requested actions, the system allows the user to provide, by speech or manual input, an indication of satisfaction or dissatisfaction. For transcription errors, the user is presented an opportunity to correct the transcription text. The system can present several transcription hypotheses to the user, and allow the user to choose among them, or to edit one of them, as the intended transcription. A back-end server system uses the corrected transcription to train a machine learning model to perform more accurate speech recognition or provide more useful actions for future users. A system can save one or more speech recognition transcription hypotheses and check corrected results against the other transcriptions to further improve models.

Patent Agency Ranking