-
公开(公告)号:US20240354713A1
公开(公告)日:2024-10-24
申请号:US18763285
申请日:2024-07-03
Applicant: Microsoft Technology Licensing, LLC
Inventor: Robert Alexander SIM , Marcello MENDES HASEGAWA , Ryen William WHITE , Mudit JAIN , Tomer HERMELIN , Adi GERZI ROSENTHAL , Sagi HILLELI
IPC: G06Q10/1093 , G06F40/10 , G06N20/00 , G06Q10/107 , G10L15/02 , G10L15/08
CPC classification number: G06Q10/1095 , G06F40/10 , G06N20/00 , G06Q10/107 , G10L15/02 , G10L2015/025 , G10L2015/088
Abstract: A system and method to provide computer support for a meeting of invitees comprises accessing one or more sensory data streams providing digitized sensory data responsive to an activity of one or more of the invitees during the meeting, the one or more sensory data streams including at least one audio stream. The method also comprises subjecting the at least one audio stream to phonetic and situational computer modeling to recognize a sequence of words in the audio stream and to assign each word to an invitee, subjecting the sequence of words to semantic computer modeling to recognize a sequence of directives in the sequence of words, and releasing one or more output data streams based on the sequence of directives, the one or more output data streams including one or more notifications.
-
公开(公告)号:US20240346950A1
公开(公告)日:2024-10-17
申请号:US18399891
申请日:2023-12-29
Applicant: VIA TECHNOLOGIES, INC.
Inventor: Jing-Jing GUO , Steve Shu LIU
IPC: G09B19/04 , G10L15/02 , G10L21/0208 , G10L25/51
CPC classification number: G09B19/04 , G10L15/02 , G10L21/0208 , G10L25/51 , G10L2015/025
Abstract: A speaking practice system with redundant pronunciation correction is shown, which provides a goodness of pronunciation (GOP) evaluation system running on a data processing server to detect redundant pronunciation in an audio recording. The audio recording is recorded when the user reads a practice text aloud. According to the detected redundant pronunciations, the user is informed to make corrections.
-
3.
公开(公告)号:US20240304187A1
公开(公告)日:2024-09-12
申请号:US18662334
申请日:2024-05-13
Applicant: GOOGLE LLC
Inventor: Christopher Hughes , Yiteng Huang , Turaj Zakizadeh Shabestary , Taylor Applebaum
IPC: G10L15/20 , G10L15/02 , G10L15/08 , G10L15/22 , G10L21/0216 , G10L21/0232 , G10L25/84
CPC classification number: G10L15/20 , G10L15/02 , G10L15/08 , G10L15/22 , G10L21/0232 , G10L25/84 , G10L2015/025 , G10L2015/088 , G10L2015/223 , G10L2021/02166
Abstract: Techniques are described for selectively adapting and/or selectively utilizing a noise reduction technique in detection of one or more features of a stream of audio data frames. For example, various techniques are directed to selectively adapting and/or utilizing a noise reduction technique in detection of an invocation phrase in a stream of audio data frames, detection of voice characteristics in a stream of audio data frames (e.g., for speaker identification), etc. Utilization of described techniques can result in more robust and/or more accurate detections of features of a stream of audio data frames in various situations, such as in environments with strong background noise. In various implementations, described techniques are implemented in combination with an automated assistant, and feature(s) detected utilizing techniques described herein are utilized to adapt the functionality of the automated assistant.
-
公开(公告)号:US20240282307A1
公开(公告)日:2024-08-22
申请号:US18650923
申请日:2024-04-30
Applicant: Universal Electronics Inc.
Inventor: Jonathan Lim
IPC: G10L15/22 , G08C17/00 , G10L15/02 , H04N21/41 , H04N21/422 , H04N21/435 , H04N21/436 , H04N21/81
CPC classification number: G10L15/22 , G08C17/00 , H04N21/4222 , H04N21/42222 , H04N21/42226 , H04N21/435 , H04N21/8186 , G08C2201/21 , G08C2201/31 , G10L2015/025 , G10L2015/228 , H04N21/41265 , H04N21/42206 , H04N21/43615
Abstract: A speech recognition engine is provided voice data indicative of at least a brand of a target appliance. The speech recognition engine uses the voice data indicative of at least a brand of the target appliance to identify within a library of codesets at least one codeset that is cross-referenced to the brand of the target appliance. The at least one codeset so identified is then caused to be provisioned to the controlling device for use in commanding functional operations of the target appliance.
-
公开(公告)号:US12039980B2
公开(公告)日:2024-07-16
申请号:US17992155
申请日:2022-11-22
Applicant: CERENCE OPERATING COMPANY
Inventor: Xiao-Lin Ren , Jianzhong Teng
CPC classification number: G10L15/22 , G10L15/005 , G10L15/02 , G10L15/08 , G10L15/18 , G10L2015/025 , G10L2015/088 , G10L15/1815 , G10L2015/223 , G10L2015/228
Abstract: A method for a user device, including receiving a first acoustic input of a user speaking a wake-up word in the target language; providing a first acoustic feature derived from the first acoustic input to an acoustic model stored on the user device to obtain a first sequence of speech units corresponding to the wake-up word spoken by the user in the target language, the acoustic model trained on a corpus of training data in a source language different than the target language; receiving a second acoustic input including the wake-up word in the target language; providing a second acoustic feature derived from the second acoustic input to the acoustic model to obtain a second sequence of speech units corresponding to the wake-up word in the target language; and comparing the first and second sequences of speech units to recognize the wake-up word in the target language.
-
公开(公告)号:US12033621B2
公开(公告)日:2024-07-09
申请号:US17231945
申请日:2021-04-15
Inventor: Dan Su , Tianxiao Fu , Min Luo , Qi Chen , Yulu Zhang , Lin Luo
IPC: G10L15/187 , G10L15/00 , G10L15/02 , G10L15/06 , G10L15/22
CPC classification number: G10L15/187 , G10L15/005 , G10L15/02 , G10L15/063 , G10L15/22 , G10L2015/025
Abstract: A method for speech recognition based on language adaptivity comprises obtaining voice data of a user. The method also comprises extracting, based on the obtained voice data, a phoneme feature representing pronunciation phoneme information. The phoneme feature is input to a pre-trained language discrimination model that is pre-trained based on a multilingual corpus. A language discrimination result corresponding to the phoneme feature and in accordance with the language discrimination model is obtained. The method also comprises obtaining a speech recognition result of the voice data based on a language acoustic model of a language corresponding to the language discrimination result. The method further comprises determining a speech recognition result of the voice data based on a language acoustic model of a language corresponding to the language discrimination result.
-
公开(公告)号:US20240221750A1
公开(公告)日:2024-07-04
申请号:US18610233
申请日:2024-03-19
Applicant: Google LLC
Inventor: Wei Li , Rohit Prakash Prabhavalkar , Kanury Kanishka Rao , Yanzhang He , Ian C. McGraw , Anton Bakhtin
CPC classification number: G10L15/22 , G10L15/02 , G10L15/063 , G10L15/18 , G10L19/00 , G10L2015/025 , G10L2015/088 , G10L15/142 , G10L2015/223
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for detecting utterances of a key phrase in an audio signal. One of the methods includes receiving, by a key phrase spotting system, an audio signal encoding one or more utterances; while continuing to receive the audio signal, generating, by the key phrase spotting system, an attention output using an attention mechanism that is configured to compute the attention output based on a series of encodings generated by an encoder comprising one or more neural network layers; generating, by the key phrase spotting system and using attention output, output that indicates whether the audio signal likely encodes the key phrase; and providing, by the key phrase spotting system, the output that indicates whether the audio signal likely encodes the key phrase.
-
公开(公告)号:US12019676B2
公开(公告)日:2024-06-25
申请号:US17540271
申请日:2021-12-02
Applicant: Shaofeng Li
Inventor: Shaofeng Li
IPC: G06F16/68 , G06F16/683 , G10L15/02 , G10L15/08
CPC classification number: G06F16/686 , G06F16/683 , G10L15/02 , G10L15/083 , G10L2015/025 , G10L2015/027
Abstract: A method for presenting a multimedia stream including a first audio stream and a second audio stream, comprising: receiving the first audio stream, wherein the first audio stream comprises a set of first audio slices sequentially located therein, wherein each first audio slice comprises a timestamp and a grade value; receiving the second audio stream, wherein the second audio stream comprises a set of second audio slices sequentially located in the second stream, and aligned in time with one of the first audio slice; presenting the first audio stream according to the timestamp of the first set of first audio slices; receiving a set of control commands including a first threshold value; determining whether the first threshold value is lower than the grade value of the first audio slice; and presenting the second audio slice aligned with the first audio slice.
-
公开(公告)号:US12014730B2
公开(公告)日:2024-06-18
申请号:US17322238
申请日:2021-05-17
Applicant: BEIJING XIAOMI MOBILE SOFTWARE CO., LTD.
Inventor: Xiangyan Xu
CPC classification number: G10L15/20 , G10L15/02 , G10L2015/025
Abstract: A voice processing method includes: collecting a voice signal by a microphone of an electronic device, and signal-processing the collected voice signal to obtain a first voice frame segment; performing voice recognition on the first voice frame segment to obtain a first recognition result; in response to the first recognition result not matching a target content and a plurality of tokens in the first recognition result meeting a preset condition, performing frame compensation on the first voice frame segment to obtain a second voice frame segment; and performing voice recognition on the second voice frame segment to obtain a second recognition result. A matching degree between the second recognition result and the target content is greater than a matching degree between the first recognition result and the target content.
-
公开(公告)号:US12008921B2
公开(公告)日:2024-06-11
申请号:US18152625
申请日:2023-01-10
Applicant: 617 Education Inc.
Inventor: Tom Dillon
CPC classification number: G09B7/04 , G06F3/167 , G09B19/04 , G10L15/02 , G10L15/063 , G10L15/22 , G10L25/18 , G10L25/30 , G10L2015/025 , G10L2015/225
Abstract: Systems and methods are described for grapheme-phoneme correspondence learning. In an example, a display of a device is caused to output a grapheme graphical user interface (GUI) that includes a grapheme. Audio data representative of a sound made by the human user is received based on the grapheme shown on the display. A grapheme-phoneme model can determine whether the sound made by the human corresponds to a phoneme for the displayed grapheme based on the audio data. The grapheme-phoneme model is trained based on augmented spectrogram data. A speaker is caused to output a sound representative of the phoneme for the grapheme to provide the human with a correct pronunciation of the grapheme in response to the grapheme-phoneme model determining that the sound made by the human does not correspond to the phoneme for the grapheme.
-
-
-
-
-
-
-
-
-