-
公开(公告)号:US11152007B2
公开(公告)日:2021-10-19
申请号:US16543155
申请日:2019-08-16
发明人: Yongshuai Lu
IPC分类号: G06F40/10 , G10L17/14 , G10L17/02 , G10L15/187 , G06F16/33 , G06F40/194 , G06K9/62
摘要: Embodiments of a method and device for matching a speech with a text, and a computer-readable storage medium are provided. The method can include: acquiring a speech identification text by identifying a received speech signal; comparing the speech identification text with multiple candidate texts in a first matching mode to determine a first matching text; and comparing phonetic symbols of the speech identification text with phonetic symbols of the multiple candidate texts in a second matching mode to determine a second matching text, in a case that no first matching text is determined.
-
公开(公告)号:US11152006B2
公开(公告)日:2021-10-19
申请号:US16020911
申请日:2018-06-27
发明人: Eyal Krupka , Shixiong Zhang , Xiong Xiao
摘要: Examples are disclosed that relate to voice identification enrollment. One example provides a method of voice identification enrollment comprising, during a meeting in which two or more human speakers speak at different times, determining whether one or more conditions of a protocol for sampling meeting audio used to establish human speaker voiceprints are satisfied, and in response to determining that the one or more conditions are satisfied, selecting a sample of meeting audio according to the protocol, the sample representing an utterance made by one of the human speakers. The method further comprises establishing, based at least on the sample, a voiceprint of the human speaker.
-
公开(公告)号:US20210289607A1
公开(公告)日:2021-09-16
申请号:US17101949
申请日:2020-11-23
申请人: Sonos, Inc.
发明人: Dayn Wilberding
摘要: Disclosed herein are example techniques to support multiple voice assistant services. An example implementation may involve a playback device capturing audio from the one or more microphones into one or more buffers as a sound data stream monitoring the sound data stream for a wake word associated with a specific voice assistant service and monitoring the sound data stream for a wake word associated with the media playback system. The playback device generates a second wake-word event corresponding to a voice input when sound data matching the wake word associated with the media playback system in a portion of the sound data stream is detected. The playback device determines that the voice input includes sound data matching one or more playback commands and sends sound data representing the voice input to a voice assistant associated with the media playback system for processing of the second voice input.
-
公开(公告)号:US11107478B2
公开(公告)日:2021-08-31
申请号:US16752007
申请日:2020-01-24
申请人: Google LLC
摘要: This document generally describes systems, methods, devices, and other techniques related to speaker verification, including (i) training a neural network for a speaker verification model, (ii) enrolling users at a client device, and (iii) verifying identities of users based on characteristics of the users' voices. Some implementations include a computer-implemented method. The method can include receiving, at a computing device, data that characterizes an utterance of a user of the computing device. A speaker representation can be generated, at the computing device, for the utterance using a neural network on the computing device. The neural network can be trained based on a plurality of training samples that each: (i) include data that characterizes a first utterance and data that characterizes one or more second utterances, and (ii) are labeled as a matching speakers sample or a non-matching speakers sample.
-
公开(公告)号:US11100943B1
公开(公告)日:2021-08-24
申请号:US16276446
申请日:2019-02-14
申请人: Otter.ai, Inc.
发明人: Yun Fu , Simon Lau , Kaisuke Nakajima , Julius Cheng , Gelei Chen , Sam Song Liang , James Mason Altreuter , Kean Kheong Chin , Zhenhao Ge , Hitesh Anand Gupta , Xiaoke Huang , James Francis McAteer , Brian Francis Williams , Tao Xing
摘要: A system for processing and presenting a conversation includes a sensor, a processor, and a presenter. The sensor is configured to capture an audio-form conversation. The processor is configured to automatically transform the audio-form conversation into a transformed conversation. The transformed conversation includes a synchronized text, wherein the synchronized text is synchronized with the audio-form conversation. The presenter is configured to present the transformed conversation including the synchronized text and the audio-form conversation. The presenter is further configured to present the transformed conversation to be navigable, searchable, assignable, editable, and shareable.
-
公开(公告)号:US20210225380A1
公开(公告)日:2021-07-22
申请号:US16300444
申请日:2018-02-27
发明人: Wenyu WANG , Yuan HU
摘要: The present disclosure provides a voiceprint recognition method and apparatus, comprising: according to an obtained command speech, recognizing, in a voiceprint recognition manner, a user class sending a command speech; according to the user class, using a corresponding speech recognition model to perform speech recognition for the command speech, to obtain a command described by the command speech; providing resources according to the user class and command. The present disclosure can avoid the problems that in a conventional voiceprint recognition method in the prior art, a client needs to participate in voiceprint recognition, and the user's ID needs to be further recognized through a voiceprint training process, and that the user's degree of satisfaction is not high. While the user speaks naturally, it is feasible to perform processing for these very “ordinary” speech, and meanwhile complete the work of voiceprint recognition.
-
公开(公告)号:US20210193153A1
公开(公告)日:2021-06-24
申请号:US17273542
申请日:2019-08-09
发明人: Chisang JUNG
摘要: The present disclosure relates to a speaker model adaptation method and device for enhancing text-independent speaker recognition performance. Specifically, the disclosure relates to a method and a device whereby, for the adaption of a speaker model pre-stored in an electronic device, text-independent speaker recognition performance is improved by considering variations in the amount of speaker characteristics information per phoneme unit.
-
公开(公告)号:US20210193151A1
公开(公告)日:2021-06-24
申请号:US16813564
申请日:2020-03-09
申请人: LG ELECTRONICS INC.
发明人: Jungmin SONG
IPC分类号: G10L17/06 , G10L17/04 , G10L25/21 , G10L25/18 , G10L17/02 , G10L17/00 , G10L25/90 , G06N3/08
摘要: A speaker voice authentication method and apparatus according to an embodiment of the present disclosure prevent a third party from attempting speaker authentication using a recorded file by distinguishing an actual voice of a speaker from a recorded file obtained by recording the voice of the speaker. Further, at the time of voice authentication, voice recognition artificial intelligence technology is selectively utilized to allow the speaker to perform voice authentication through only one utterance, and receiving of the voice of the speaker may be performed in an Internet of Things (IoT) environment using a 5G network.
-
公开(公告)号:US20210183395A1
公开(公告)日:2021-06-17
申请号:US17184323
申请日:2021-02-24
申请人: FTR LABS PTY LTD
IPC分类号: G10L17/06 , G10L17/02 , G10L17/04 , G10L21/0272 , G10L25/78
摘要: Embodiments of the present invention provide methods and systems for performing automatic diarisation of sound recordings including speech from one more speakers. The automatic diarisation has a development or training phase and a utilisation or evaluation phase. In the development or training phase background models and hyperparameters are generated from already annotated sound recordings. These models and hyperparameters are applied during the evaluation or utilisation phase to diarise new or not previously diarised or annotated recordings.
-
100.
公开(公告)号:US20210174818A1
公开(公告)日:2021-06-10
申请号:US17125566
申请日:2020-12-17
发明人: Ritwik Giri , Karim Helwani , Tao Zhang
IPC分类号: G10L21/0216 , H04R3/00 , G10L17/02 , H04R5/04 , H04R25/00 , G10L21/0208 , H04R1/10
摘要: A system comprises an ear-worn electronic device configured to be worn by a wearer. The ear-worn electronic device comprises a processor and memory coupled to the processor. The memory is configured to store an annoying sound dictionary representative of a plurality of annoying sounds pre-identified by the wearer. A microphone is coupled to the processor and configured to monitor an acoustic environment of the wearer. A speaker or a receiver is coupled to the processor. The processor is configured to identify different background noises present in the acoustic environment, determine which of the background noises correspond to one or more of the plurality of annoying sounds, and attenuate the one or more annoying sounds in an output signal provided to the speaker or receiver.
-
-
-
-
-
-
-
-
-