-
1.
公开(公告)号:US11049496B2
公开(公告)日:2021-06-29
申请号:US16203963
申请日:2018-11-29
IPC分类号: G10L15/08 , G10L15/26 , G10L21/0208
摘要: Disclosed in some examples, are methods, systems, and machine-readable mediums for preventing unintended activation of voice command processing of a voice activated device. A first audio signal may be an audio signal that is to be output to a speaker communicatively coupled to the computing device. A second audio signal may be input from a microphone or other audio capture device. Both audio signals are input to a keyword detector to check for the presence of activation keywords. If the activation keyword(s) are detected in the second audio signal but not the first audio signal the voice command processing of the device is activated as this is likely a command from the user and not feedback from the loudspeaker.
-
公开(公告)号:US20210133577A1
公开(公告)日:2021-05-06
申请号:US16828889
申请日:2020-03-24
发明人: Sriram Srinivasan , David Yuheng Zhao , Ming-Chieh Lee , Mu Han
摘要: Apparatus and methods are disclosed for using machine learning models with private and public domains. Operations can be applied to transform input to a machine learning model in a private domain that is kept secret or otherwise made unavailable to third parties. In one example of the disclosed technology, a method includes applying a private transform to produce transformed input, providing the transformed input to a machine learning model that was trained using a training set modified by the private transform, and generating inferences with the machine learning model using the transformed input. Examples of suitable transforms that can be employed include matrix multiplication, time or spatial domain to frequency domains, and partitioning a neural network model such that an input and at least one hidden layer form part of the private domain, while the remaining layers form part of the public domain.
-
公开(公告)号:US20200244584A1
公开(公告)日:2020-07-30
申请号:US16260771
申请日:2019-01-29
IPC分类号: H04L12/841 , H04N21/44 , H04L29/06 , H04L12/26 , H04N21/43
摘要: Techniques are described for managing synchronized jitter buffers for streaming data (e.g., for real-time audio and/or video communications). A separate jitter buffer can be maintained for each codec. For example, as data is received in network packets, the data is added to the jitter buffer corresponding to the codec that is associated with the received data. When data needs to be read, the same amount of data is read from each of the jitter buffers. In other words, at each instance where data needs to be obtained (e.g., for decoding and playback), the same amount of data is obtained from each of the jitter buffers. In addition, the multiple jitter buffers use the same playout timestamp that is synchronized across the multiple of jitter buffers.
-
公开(公告)号:US10701124B1
公开(公告)日:2020-06-30
申请号:US16216513
申请日:2018-12-11
IPC分类号: H04L29/06
摘要: Techniques are described for determining corrected timestamps for streaming data that is encoded using frames with a variable frame size. The streaming data is encoded into frames and transmitted in network packets in which the network packets or frames are associated with timestamps incremented in fixed steps. When a network packet is received after a lost packet, a corrected timestamp range can be calculated for the received packet based at least in part on the received timestamp value and attributes of the received network packet along with buffering characteristics.
-
公开(公告)号:US10693748B2
公开(公告)日:2020-06-23
申请号:US15590858
申请日:2017-05-09
发明人: Chani A. Doggett , Brian R. Meyers , John E. Gallardo , Abolade Gbadegesin , Michael J. Novak , Yisheng Yao , Bartosz H. Paliswiat , Kiran Tatapudi , Colleen E. Hamilton , Shawn P. Henry , Kenneth M. Tubbs , Sriram Srinivasan , Mahmut Arslan
摘要: Technology related to an activity feed service is disclosed. In one example of the disclosed technology, a method can include receiving updates to activity streams, where a respective activity stream indicates an engagement of a respective user with applications executing on a respective client device connected to a network. The different activity streams associated with a particular user can be merged to generate a merged activity stream associated with the particular user. The different received activity streams can correspond to different respective client devices. The merged activity stream associated with the particular user can be transmitted over the network.
-
公开(公告)号:US10147415B2
公开(公告)日:2018-12-04
申请号:US15422865
申请日:2017-02-02
发明人: Ross G. Cutler , Sriram Srinivasan , Ramin Mehran , Karlton David Sequeira , Jayant Ajit Gupchup , Senthil K. Velayutham
IPC分类号: G10L13/033 , G10L13/08 , H04L12/26 , H04L29/06 , H04M7/00 , G10L13/047 , H04S7/00
摘要: Content is received at a receiving equipment from a transmitting user terminal over a network in a communication session between a transmitting user and a receiving user. The received content comprises audio data representing speech spoken by a voice of the transmitting user, and further comprises text data generated from speech spoken by the voice of the transmitting user during the communication session. At the receiving equipment, at least a portion of the received text data is converted to artificially-generated audible speech based on a model of the transmitting user's voice stored at the receiving equipment (and in embodiments in dependence on the receive audio quality). The received audio data and the artificially-generated speech are supplied to be played out to the receiving user through one or more speakers.
-
公开(公告)号:US20180302302A1
公开(公告)日:2018-10-18
申请号:US15590858
申请日:2017-05-09
发明人: Chani A. Doggett , Brian R. Meyers , John E. Gallardo , Abolade Gbadegesin , Michael J. Novak , Yisheng Yao , Bartosz H. Paliswiat , Kiran Tatapudi , Colleen E. Hamilton , Shawn P. Henry , Kenneth M. Tubbs , Sriram Srinivasan , Mahmut Arslan
摘要: Technology related to an activity feed service is disclosed. In one example of the disclosed technology, a method can include receiving updates to activity streams, where a respective activity stream indicates an engagement of a respective user with applications executing on a respective client device connected to a network. The different activity streams associated with a particular user can be merged to generate a merged activity stream associated with the particular user. The different received activity streams can correspond to different respective client devices. The merged activity stream associated with the particular user can be transmitted over the network.
-
公开(公告)号:US20180218727A1
公开(公告)日:2018-08-02
申请号:US15422865
申请日:2017-02-02
发明人: Ross G. Cutler , Sriram Srinivasan , Ramin Mehran , Karlton David Sequeira , Jayant Ajit Gupchup , Senthil K. Velayutham
CPC分类号: G10L13/033 , G10L13/04 , G10L13/047 , G10L13/08 , G10L19/0018 , H04L43/08 , H04L65/1069 , H04M3/2236 , H04M7/0084 , H04M2201/40 , H04S7/30 , H04S2420/01
摘要: Content is received at a receiving equipment from a transmitting user terminal over a network in a communication session between a transmitting user and a receiving user. The received content comprises audio data representing speech spoken by a voice of the transmitting user, and further comprises text data generated from speech spoken by the voice of the transmitting user during the communication session. At the receiving equipment, at least a portion of the received text data is converted to artificially-generated audible speech based on a model of the transmitting user's voice stored at the receiving equipment (and in embodiments in dependence on the receive audio quality). The received audio data and the artificially-generated speech are supplied to be played out to the receiving user through one or more speakers.
-
公开(公告)号:US11817107B2
公开(公告)日:2023-11-14
申请号:US17875237
申请日:2022-07-27
CPC分类号: G10L19/0018 , G10L19/265 , G10L25/12 , G10L25/69 , G10L25/72
摘要: Innovations in phase quantization during speech encoding and phase reconstruction during speech decoding are described. For example, to encode a set of phase values, a speech encoder omits higher-frequency phase values and/or represents at least some of the phase values as a weighted sum of basis functions. Or, as another example, to decode a set of phase values, a speech decoder reconstructs at least some of the phase values using a weighted sum of basis functions and/or reconstructs lower-frequency phase values then uses at least some of the lower-frequency phase values to synthesize higher-frequency phase values. In many cases, the innovations improve the performance of a speech codec in low bitrate scenarios, even when encoded data is delivered over a network that suffers from insufficient bandwidth or transmission quality problems.
-
公开(公告)号:US11558275B2
公开(公告)日:2023-01-17
申请号:US16877257
申请日:2020-05-18
发明人: Xiulian Peng , Vinod Prakash , Xiangyu Kong , Sriram Srinivasan , Yan Lu
IPC分类号: H04L47/283 , H04L43/087 , G06K9/62 , G06N20/00 , H04L41/14
摘要: Disclosed in some examples are methods, systems, and machine-readable mediums which determine jitter buffer delay by inputting jitter buffer and currently observed network status information to a machine learned model that is trained using a reinforcement learning (RL) method. The model maps these inputs to an action to compress, stretch, or hold the jitter buffer delay, which is used by a recipient computing device to optimize the jitter buffer delay. The model may be trained using a simulator that uses network traces of past real streaming sessions (e.g., communication sessions) of users. By training the model through reinforcement learning, the model learns to make better decisions through reinforcement in the form of reward signals that reflect the performance of each decision.
-
-
-
-
-
-
-
-
-