System command processing
    31.
    发明授权

    公开(公告)号:US10600419B1

    公开(公告)日:2020-03-24

    申请号:US15712676

    申请日:2017-09-22

    Abstract: Techniques for performing command processing are described. A system receives, from a device, input data corresponding to a command. The system determines NLU processing results associated with multiple applications. The system also determines NLU confidences for the NLU processing results for each application. The system sends NLU processing results to a portion of the multiple applications, and receives output data or instructions from the portion of the applications. The system ranks the portion of the applications based at least in part on the NLU processing results associated with the portion of the applications as well as the output data or instructions provided by the portion of the applications. The system may also rank the portion of the applications using other data. The system causes content corresponding to output data or instructions provided by the highest ranked application to be output to a user.

    Speech based user recognition
    32.
    发明授权

    公开(公告)号:US10522134B1

    公开(公告)日:2019-12-31

    申请号:US15388458

    申请日:2016-12-22

    Abstract: Systems, methods, and devices for verifying a user are disclosed. A speech-controlled device captures a spoken command, and sends audio data corresponding thereto to a server. The server performs ASR on the audio data to determine ASR confidence data. The server, in parallel, performs user verification on the audio data to determine user verification confidence data. The server may modify the user verification confidence data using the ASR confidence data. In addition or alternatively, the server may modify the user verification confidence data using at least one of a location of the speech-controlled device within a building, a type of the speech-controlled device, or a geographic location of the speech-controlled device.

    Self-supervised federated learning
    34.
    发明授权

    公开(公告)号:US12039998B1

    公开(公告)日:2024-07-16

    申请号:US17665129

    申请日:2022-02-04

    CPC classification number: G10L25/78 G06N3/045 G06N3/08 G10L25/21

    Abstract: An acoustic event detection system may employ self-supervised federated learning to update encoder and/or classifier machine learning models. In an example operation, an encoder may be pre-trained to extract audio feature data from an audio signal. A decoder may be pre-trained to predict a subsequent portion of audio data (e.g., a subsequent frame of audio data represented by log filterbank energies). The encoder and decoder may be trained using self-supervised learning to improve the decoder's predictions and, by extension, the quality of the audio feature data generated by the encoder. The system may apply federated learning to share encoder updates across user devices. The system may fine-tune the classifier to improve inferences based on the improved audio feature data. The system may distribute classifier updates to the user device(s) to update the on-device classifier.

    Speech based user recognition
    35.
    发明授权

    公开(公告)号:US11893999B1

    公开(公告)日:2024-02-06

    申请号:US16055755

    申请日:2018-08-06

    CPC classification number: G10L17/22 G06F40/20 G10L17/04 G10L17/10

    Abstract: Techniques for enrolling a user in a system's user recognition functionality without requiring the user speak particular speech are described. The system may determine characteristics unique to a user input. The system may generate an implicit voice profile from user inputs having similar characteristics. After an implicit voice profile is generated, the system may receive a user input having speech characteristics similar to that of the implicit voice profile. The system may ask the user if the user wants the system to associate the implicit voice profile with a particular user identifier. If the user responds affirmatively, the system may request an identifier of a user profile (e.g., a user name). In response to receiving the user's name, the system may identify a user profile associated with the name and associate the implicit voice profile with the user profile, thereby converting the implicit voice profile into an explicit voice profile.

    Wakeword and acoustic event detection

    公开(公告)号:US11670299B2

    公开(公告)日:2023-06-06

    申请号:US17321999

    申请日:2021-05-17

    CPC classification number: G10L15/22 G10L15/16

    Abstract: A system processes audio data to detect when it includes a representation of a wakeword or of an acoustic event. The system may receive or determine acoustic features for the audio data, such as log-filterbank energy (LFBE). The acoustic features may be used by a first, wakeword-detection model to detect the wakeword; the output of this model may be further processed using a softmax function, to smooth it, and to detect spikes. The same acoustic features may be also be used by a second, acoustic-event-detection model to detect the acoustic event; the output of this model may be further processed using a sigmoid function and a classifier. Another model may be used to extract additional features from the LFBE data; these additional features may be used by the other models.

    Processing complex utterances for natural language understanding

    公开(公告)号:US11410646B1

    公开(公告)日:2022-08-09

    申请号:US16368399

    申请日:2019-03-28

    Abstract: A system capable of performing natural language understanding (NLU) on utterances including complex command structures such as sequential commands (e.g., multiple commands in a single utterance), conditional commands (e.g., commands that are only executed if a condition is satisfied), and/or repetitive commands (e.g., commands that are executed until a condition is satisfied). Audio data may be processed using automatic speech recognition (ASR) techniques to obtain text. The text may then be processed using machine learning models that are trained to parse text of incoming utterances. The models may identify complex utterance structures and may identify what command portions of an utterance go with what conditional statements. Machine learning models may also identify what data is needed to determine when the conditionals are true so the system may cause the commands to be executed (and stopped) at the appropriate times.

    SPEECH BASED USER RECOGNITION
    40.
    发明申请

    公开(公告)号:US20220189458A1

    公开(公告)日:2022-06-16

    申请号:US17584489

    申请日:2022-01-26

    Abstract: Systems, methods, and devices for verifying a user are disclosed. A speech-controlled device captures a spoken command, and sends audio data corresponding thereto to a server. The server performs ASR on the audio data to determine ASR confidence data. The server, in parallel, performs user verification on the audio data to determine user verification confidence data. The server may modify the user verification confidence data using the ASR confidence data. In addition or alternatively, the server may modify the user verification confidence data using at least one of a location of the speech-controlled device within a building, a type of the speech-controlled device, or a geographic location of the speech-controlled device.

Patent Agency Ranking