-
公开(公告)号:US20240112672A1
公开(公告)日:2024-04-04
申请号:US17959637
申请日:2022-10-04
Applicant: GOOGLE LLC
Inventor: Rajiv Mathews , Dragan Zivkovic , Khe Chai Sim
CPC classification number: G10L15/19 , G10L15/063 , G10L15/22 , G10L15/30 , G10L2015/0635
Abstract: On-device processor(s) of a client device may store, in on-device storage and in association with a time to live (TTL) in the on-device storage, a correction directed to ASR processing of audio data. The correction may include a portion of a given speech hypothesis that was modified to an alternate speech hypothesis. Further, the on-device processor(s) may cause an on-device ASR model to be personalized based on the correction. Moreover, and based on additional ASR processing of additional audio data, the on-device processor(s) may store, in the on-device storage and in association with an additional TTL in the on-device storage, a pseudo-correction directed to the additional ASR processing. Accordingly, the on-device processor(s) may cause the on-device ASR model to be personalized based on the pseudo-correction to prevent forgetting by the on-device ASR model.
-
公开(公告)号:US11823659B2
公开(公告)日:2023-11-21
申请号:US16711046
申请日:2019-12-11
Applicant: Amazon Technologies, Inc.
Inventor: Julia Reinspach , Oleg Rokhlenko , Ramakanthachary Gottumukkala , Giovanni Clemente , Ankit Agrawal , Swayam Bhardwaj , Guy Michaeli , Vaidyanathan Puthucode Krishnamoorthy , Costantino Vlachos , Nalledath P. Vinodkrishnan , Shaun M. Vickers , Sethuraman Ramachandran , Charles C. Moore
IPC: G10L15/06 , G10L15/01 , G10L15/02 , G10L15/187 , G10L15/22
CPC classification number: G10L15/063 , G10L15/01 , G10L15/02 , G10L15/187 , G10L15/22 , G10L2015/025 , G10L2015/0635 , G10L2015/223
Abstract: A request including audio data is received from a voice-enabled device. A string of phonemes present in the utterance is determined through speech recognition. At a later time, a subsequent user input corresponding to the request may be received, in which the user input is associated with one or more text keywords. The subsequent user input may be obtained in response to an active request. Alternatively, feedback may not be actively elicited, but rather collected passively. However it is obtained, the one or more keywords associated with the subsequent user input may be associated with the string of phonemes to indicate that the user is saying or mean those words when they product that string of phonemes. A user-specific speech recognition key for the user account is then updated to associate the string of phonemes with these words. A general speech recognition model can also be trained using the association.
-
公开(公告)号:US11783850B1
公开(公告)日:2023-10-10
申请号:US17216840
申请日:2021-03-30
Applicant: Amazon Technologies, Inc.
Inventor: Harshavardhan Sundar , Sheetal Laad , Jialiang Bao , Ming Sun , Chao Wang , Chungnam Chan , Cengiz Erbas , Mathias Jourdain , Nipul Bharani , Aaron David Wirshba
CPC classification number: G10L25/51 , G10L15/063 , G10L15/22 , G10L25/78 , G10L2015/0635
Abstract: Techniques for detecting certain acoustic events from audio data are described. A system may perform event aggregation for certain types of events before sending an output to a device representing the event is detected. The system may bypass the event aggregation process for certain types of events that the system may detect with a high level of confidence. In such cases, the system may send an output to the device when the event is detected. The system may be used to detect acoustic events representing presence of a person or other harmful circumstances (such as, fire, smoke, etc.) in a home, an office, a store, or other types of indoor settings.
-
公开(公告)号:US20230298565A1
公开(公告)日:2023-09-21
申请号:US17660487
申请日:2022-04-25
Applicant: Google LLC
Inventor: Andrew M. Rosenberg , Gary Wang , Bhuvana Ramabhadran , Fadi Biadsy
IPC: G10L15/06 , G10L15/197 , G10L13/02 , G10L19/038 , G10L15/22
CPC classification number: G10L15/063 , G10L15/197 , G10L13/02 , G10L19/038 , G10L15/22 , G10L2015/0635 , G10L2019/0001
Abstract: A method includes receiving a set of training utterances each including a non-synthetic speech representation of a corresponding utterance, and for each training utterance, generating a corresponding synthetic speech representation by using a voice conversion model. The non-synthetic speech representation and the synthetic speech representation form a corresponding training utterance pair. At each of a plurality of output steps for each training utterance pair, the method also includes generating, for output by a speech recognition model, a first probability distribution over possible non-synthetic speech recognition hypotheses for the non-synthetic speech representation and a second probability distribution over possible synthetic speech recognition hypotheses for the synthetic speech representation. The method also includes determining a consistent loss term for the corresponding training utterance pair based on the first and second probability distributions and updating parameters of the speech recognition model based on the consistent loss term.
-
公开(公告)号:US11676579B2
公开(公告)日:2023-06-13
申请号:US17073149
申请日:2020-10-16
Applicant: Deepgram, Inc.
Inventor: Jeff Ward , Adam Sypniewski , Scott Stephenson
IPC: G10L15/16 , G10L15/06 , G06N3/084 , G10L25/18 , G10L25/24 , G06V10/44 , G06F18/214 , G06F18/2413 , G06N3/044 , G06N3/045 , G06N3/048 , G06N3/08 , G10L15/02 , G10L15/22 , G10L15/30 , G10L15/197 , G10L15/08
CPC classification number: G10L15/16 , G06F18/214 , G06F18/24133 , G06N3/044 , G06N3/045 , G06N3/048 , G06N3/08 , G06N3/084 , G06V10/454 , G10L15/02 , G10L15/063 , G10L15/22 , G10L15/30 , G10L25/18 , G10L25/24 , G10L15/197 , G10L2015/0635 , G10L2015/081
Abstract: Systems and methods are disclosed for generating internal state representations of a neural network during processing and using the internal state representations for classification or search. In some embodiments, the internal state representations are generated from the output activation functions of a subset of nodes of the neural network. The internal state representations may be used for classification by training a classification model using internal state representations and corresponding classifications. The internal state representations may be used for search, by producing a search feature from an search input and comparing the search feature with one or more feature representations to find the feature representation with the highest degree of similarity.
-
公开(公告)号:US11676575B2
公开(公告)日:2023-06-13
申请号:US17386078
申请日:2021-07-27
Applicant: Amazon Technologies, Inc.
Inventor: Ariya Rastrow , Rohit Prasad , Nikko Strom
CPC classification number: G10L15/063 , G10L15/18 , G10L15/30 , G10L2015/0635
Abstract: A speech interface device is configured to receive response data from a remote speech processing system for responding to user speech. This response data may be enhanced with information such as remote NLU data. The response data from the remote speech processing system may be compared to local NLU data to improve a speech processing model on the device. Thus, the device may perform supervised on-device learning based on the remote NLU data. The device may determine differences between the updated speech processing model and an original speech processing model received from the remote system and may send data indicating these differences to the remote system. The remote system may aggregate data received from a plurality of devices and may generate an improved speech processing model.
-
公开(公告)号:US11651765B2
公开(公告)日:2023-05-16
申请号:US17070283
申请日:2020-10-14
Applicant: Google Technology Holdings LLC
Inventor: Kristin A. Gray
IPC: G10L15/00 , G06F40/174 , G10L15/187 , G10L15/01 , G10L15/06 , G10L15/22 , G10L15/30
CPC classification number: G10L15/00 , G06F40/174 , G10L15/01 , G10L15/063 , G10L15/187 , G10L15/22 , G10L15/30 , G10L2015/0635
Abstract: Techniques and apparatuses for recognizing accented speech are described. In some embodiments, an accent module recognizes accented speech using an accent library based on device data, uses different speech recognition correction levels based on an application field into which recognized words are set to be provided, or updates an accent library based on corrections made to incorrectly recognized speech.
-
98.
公开(公告)号:US20230144379A1
公开(公告)日:2023-05-11
申请号:US17520816
申请日:2021-11-08
Applicant: GENESYS CLOUD SERVICES, INC.
Inventor: LEV HAIKIN , ARNON MAZZA , EYAL ORBACH , AVRAHAM FAIZAKOF
IPC: G10L15/197 , G10L15/06 , G10L15/22 , G10L15/10 , G06N20/00
CPC classification number: G10L15/197 , G10L15/063 , G10L15/22 , G10L15/10 , G06N20/00 , G10L2015/0635
Abstract: A system and method of automatically discovering unigrams in a speech data element may include receiving a language model that includes a plurality of n-grams, where each n-gram includes one or more unigrams; applying an acoustic machine-learning (ML) model on one or more speech data elements to obtain a character distribution function; applying a greedy decoder on the character distribution function, to predict an initial corpus of unigrams; filtering out one or more unigrams of the initial corpus to obtain a corpus of candidate unigrams, where the candidate unigrams are not included in the language model; analyzing the one or more first speech data elements, to extract at least one n-gram that comprises a candidate unigram; and updating the language model to include the extracted at least one n-gram.
-
公开(公告)号:US20190244604A1
公开(公告)日:2019-08-08
申请号:US16333156
申请日:2017-09-05
Applicant: NIPPON TELEGRAPH AND TELEPHONE CORPORATION
Inventor: Hirokazu MASATAKI , Taichi ASAMI , Takashi NAKAMURA , Ryo MASUMURA
CPC classification number: G10L15/16 , G06F17/2715 , G06N3/0454 , G06N3/049 , G06N3/08 , G06N99/00 , G10L15/06 , G10L15/063 , G10L15/065 , G10L2015/0635
Abstract: A model learning device comprises: an initial value setting part that uses a parameter of a learned first model including a neural network to set a parameter of a second model including a neural network having a same network structure as the first model; a first output probability distribution calculating part that calculates a first output probability distribution including a distribution of an output probability of each unit on an output layer, using learning features and the first model; a second output probability distribution calculating part that calculates a second output probability distribution including a distribution of an output probability of each unit on the output layer, using learning features and the second model; and a modified model update part that obtains a weighted sum of a second loss function calculated from correct information and from the second output probability distribution, and a cross entropy between the first output probability distribution and the second output probability distribution, and updates the parameter of the second model so as to reduce the weighted sum.
-
公开(公告)号:US20180268814A1
公开(公告)日:2018-09-20
申请号:US15462564
申请日:2017-03-17
Applicant: Microsoft Technology Licensing, LLC
Inventor: Suma SaganeGowda , Louis Amadio , Artem Zhurid
CPC classification number: G10L15/22 , G10L15/063 , G10L2015/0635 , G10L2015/223 , G10L2015/227 , G10L2015/228
Abstract: Techniques for controlling a voice activated feature of a voice activated device are described. Data from one or more sensors and data indicative of a status of a user are received. Based on the analyzing the data, a proximity of the user relative to the location of the voice activated device is determined. One or more voice activated features of the voice activated device are enabled based at least in part on the determined proximity, one or more rules, and one or more user preferences.
-
-
-
-
-
-
-
-
-