Audio pipeline for simultaneous keyword spotting, transcription, and real time communications

    公开(公告)号:US11049496B2

    公开(公告)日:2021-06-29

    申请号:US16203963

    申请日:2018-11-29

    摘要: Disclosed in some examples, are methods, systems, and machine-readable mediums for preventing unintended activation of voice command processing of a voice activated device. A first audio signal may be an audio signal that is to be output to a speaker communicatively coupled to the computing device. A second audio signal may be input from a microphone or other audio capture device. Both audio signals are input to a keyword detector to check for the presence of activation keywords. If the activation keyword(s) are detected in the second audio signal but not the first audio signal the voice command processing of the device is activated as this is likely a command from the user and not feedback from the loudspeaker.

    PROTECTING DEEP LEARNED MODELS
    2.
    发明申请

    公开(公告)号:US20210133577A1

    公开(公告)日:2021-05-06

    申请号:US16828889

    申请日:2020-03-24

    摘要: Apparatus and methods are disclosed for using machine learning models with private and public domains. Operations can be applied to transform input to a machine learning model in a private domain that is kept secret or otherwise made unavailable to third parties. In one example of the disclosed technology, a method includes applying a private transform to produce transformed input, providing the transformed input to a machine learning model that was trained using a training set modified by the private transform, and generating inferences with the machine learning model using the transformed input. Examples of suitable transforms that can be employed include matrix multiplication, time or spatial domain to frequency domains, and partitioning a neural network model such that an input and at least one hidden layer form part of the private domain, while the remaining layers form part of the public domain.

    SYNCHRONIZED JITTER BUFFERS TO HANDLE CODEC SWITCHES

    公开(公告)号:US20200244584A1

    公开(公告)日:2020-07-30

    申请号:US16260771

    申请日:2019-01-29

    摘要: Techniques are described for managing synchronized jitter buffers for streaming data (e.g., for real-time audio and/or video communications). A separate jitter buffer can be maintained for each codec. For example, as data is received in network packets, the data is added to the jitter buffer corresponding to the codec that is associated with the received data. When data needs to be read, the same amount of data is read from each of the jitter buffers. In other words, at each instance where data needs to be obtained (e.g., for decoding and playback), the same amount of data is obtained from each of the jitter buffers. In addition, the multiple jitter buffers use the same playout timestamp that is synchronized across the multiple of jitter buffers.

    Handling timestamp inaccuracies for streaming network protocols

    公开(公告)号:US10701124B1

    公开(公告)日:2020-06-30

    申请号:US16216513

    申请日:2018-12-11

    IPC分类号: H04L29/06

    摘要: Techniques are described for determining corrected timestamps for streaming data that is encoded using frames with a variable frame size. The streaming data is encoded into frames and transmitted in network packets in which the network packets or frames are associated with timestamps incremented in fixed steps. When a network packet is received after a lost packet, a corrected timestamp range can be calculated for the received packet based at least in part on the received timestamp value and attributes of the received network packet along with buffering characteristics.

    Phase reconstruction in a speech decoder

    公开(公告)号:US11817107B2

    公开(公告)日:2023-11-14

    申请号:US17875237

    申请日:2022-07-27

    摘要: Innovations in phase quantization during speech encoding and phase reconstruction during speech decoding are described. For example, to encode a set of phase values, a speech encoder omits higher-frequency phase values and/or represents at least some of the phase values as a weighted sum of basis functions. Or, as another example, to decode a set of phase values, a speech decoder reconstructs at least some of the phase values using a weighted sum of basis functions and/or reconstructs lower-frequency phase values then uses at least some of the lower-frequency phase values to synthesize higher-frequency phase values. In many cases, the innovations improve the performance of a speech codec in low bitrate scenarios, even when encoded data is delivered over a network that suffers from insufficient bandwidth or transmission quality problems.

    Reinforcement learning for jitter buffer control

    公开(公告)号:US11558275B2

    公开(公告)日:2023-01-17

    申请号:US16877257

    申请日:2020-05-18

    摘要: Disclosed in some examples are methods, systems, and machine-readable mediums which determine jitter buffer delay by inputting jitter buffer and currently observed network status information to a machine learned model that is trained using a reinforcement learning (RL) method. The model maps these inputs to an action to compress, stretch, or hold the jitter buffer delay, which is used by a recipient computing device to optimize the jitter buffer delay. The model may be trained using a simulator that uses network traces of past real streaming sessions (e.g., communication sessions) of users. By training the model through reinforcement learning, the model learns to make better decisions through reinforcement in the form of reward signals that reflect the performance of each decision.