-
51.
公开(公告)号:US11798563B2
公开(公告)日:2023-10-24
申请号:US17617296
申请日:2020-08-26
发明人: Yuechao Guo , Yixuan Qiao , Yijun Tang , Jun Wang , Peng Gao , Guotong Xie
摘要: A method for voiceprint recognition of an original speech is used to reduce information losses and system complexity of a model for data recognition of a speaker's original speech. The method includes: obtaining original speech data, and segmenting the original speech data based on a preset time length to obtain segmented speech data; performing tail-biting convolution processing and discrete Fourier transform on the segmented speech data through a preset convolution filter bank to obtain voiceprint feature data; pooling the voiceprint feature data through a preset deep neural network to obtain a target voiceprint feature; performing embedded vector transformation on the target voiceprint feature to obtain corresponding voiceprint feature vectors; and performing calculation on the voiceprint feature vectors through a preset loss function to obtain target voiceprint data, where the loss function includes a cosine similarity matrix loss function and a minimum mean square error matrix loss function.
-
52.
公开(公告)号:US11798561B2
公开(公告)日:2023-10-24
申请号:US17566250
申请日:2021-12-30
发明人: Cheng-Yu Wang , Po-Cheng Chen , Yu-Te Lee
IPC分类号: G10L17/06 , G06T17/20 , G10L17/02 , G10L21/007 , G06T7/20
CPC分类号: G10L17/06 , G06T7/20 , G06T17/20 , G10L17/02 , G10L21/007 , G06T2207/30201
摘要: A method for processing audio generated in a virtual meeting room (VMR) includes setting a quantity of mesh vertexes according to seats in the VMR, obtaining first voiceprint information of a presenter, the first voiceprint information comprising a frequency, an amplitude, and a phase difference of an audio signal, adjusting the frequency or amplitude of the first voiceprint information according to the quantity of the mesh vertexes, and obtaining second voiceprint information; and determining a seat of the presenter in the VMR according to the second voiceprint information. An apparatus and a non-transitory computer readable medium for processing audio as above are also disclosed.
-
公开(公告)号:US11790921B2
公开(公告)日:2023-10-17
申请号:US17169843
申请日:2021-02-08
申请人: OTO Systems Inc.
IPC分类号: G10L17/06 , G10L17/02 , G10L17/04 , G10L17/18 , G06N3/04 , G06N3/08 , G06N3/049 , G10L21/0272 , G06N3/045
CPC分类号: G10L17/06 , G06N3/045 , G06N3/049 , G06N3/08 , G10L17/02 , G10L17/04 , G10L17/18 , G10L21/0272
摘要: Systems, methods, and non-transitory computer-readable media can obtain a stream of audio waveform data that represents speech involving a plurality of speakers. As the stream of audio waveform data is obtained, a plurality of audio chunks can be determined. An audio chunk can be associated with one or more identity embeddings. The stream of audio waveform data can be segmented into a plurality of segments based on the plurality of audio chunks and respective identity embeddings associated with the plurality of audio chunks. A segment can be associated with a speaker included in the plurality of speakers. Information describing the plurality of segments associated with the stream of audio waveform data can be provided.
-
公开(公告)号:US20230326464A1
公开(公告)日:2023-10-12
申请号:US18331920
申请日:2023-06-08
CPC分类号: G10L17/04 , G10L17/02 , G10L17/06 , H04M3/42221
摘要: Methods and systems are disclosed herein for improving the quality of audio for use in a biometric. A biometric system may use machine learning to determine whether audio or a portion of the audio should be used as a biometric for a user. A sample of the user’s voice may be used to generate a voice signature of the user. Portions of the audio that do not meet a similarity threshold when compared with the voice signature may be removed from the audio. Additionally or alternatively, interfering noises may be detected and removed from the audio to improve the quality of a voice biometric generated from the audio.
-
公开(公告)号:US20230306971A1
公开(公告)日:2023-09-28
申请号:US18325873
申请日:2023-05-30
IPC分类号: G10L17/24 , G10L17/06 , G10L17/04 , A61B5/00 , A61B5/107 , G06Q20/40 , G06F21/32 , G06F16/635 , A61B5/117 , G06V40/70 , G06V40/18 , G06V40/12
CPC分类号: G10L17/24 , G10L17/06 , G10L17/04 , A61B5/6817 , A61B5/1076 , G06Q20/40145 , G06F21/32 , G06F16/636 , A61B5/117 , G06V40/70 , G06V40/197 , G06V40/1365 , G16H50/50
摘要: Introduced here are approaches to authenticating the identity of speakers based on the “liveness” of the input. To prevent spoofing, an authentication platform may establish the likelihood that a voice sample represents a recording of word(s) uttered by a speaker whose identity is to be authenticated and then, based on the likelihood, determine whether to authenticate the speaker.
-
公开(公告)号:US20230305633A1
公开(公告)日:2023-09-28
申请号:US18109315
申请日:2023-02-14
发明人: Guy WAGNER , Leeor Langer , Asher Dahan
IPC分类号: G06F3/01 , G06F3/16 , G06F3/0346 , G10L17/22 , G10L17/06
摘要: A gesture and voice-controlled interface device comprising one or a plurality of gesture sensors for sensing gestures of a user; one or a plurality of audio sensors for sensing sounds made by the user; and a processor configured to obtain one or a plurality of sensed gestures from said one or a plurality of gesture sensors and to obtain one or a plurality of sensed sounds from said one or a plurality of audio sensors, to analyze the sensed gesture and sensed sounds to identify an input from the user, and to generate an output signal corresponding to the input to a controlled device
-
公开(公告)号:US11762965B2
公开(公告)日:2023-09-19
申请号:US16808033
申请日:2020-03-03
申请人: WALGREEN CO.
IPC分类号: G16H20/10 , G16H10/60 , G06Q10/10 , G06F21/32 , G16H40/20 , G10L17/06 , G10L25/51 , G06Q50/26 , G16H40/63 , G10L15/22 , G06F3/16 , G16H80/00 , G06F3/0481 , G10L17/22 , G10L17/00 , G06F21/62
CPC分类号: G06F21/32 , G06F3/0481 , G06F3/167 , G06F21/6245 , G06Q50/265 , G10L15/22 , G10L17/00 , G10L17/06 , G10L17/22 , G10L25/51 , G16H10/60 , G16H20/10 , G16H40/20 , G16H40/63 , G16H80/00 , G10L2015/223 , G10L2015/228
摘要: Methods and systems may incorporate voice interaction and other audio interaction to facilitate access to prescription related information and processes. Particularly, voice/audio interactions may be utilized to achieve authentication to access prescription-related information and action capabilities. Additionally, voice/audio interactions may be utilized in performance of processes such as obtaining prescription refills and receiving reminders to consume prescription products.
-
58.
公开(公告)号:US11756555B2
公开(公告)日:2023-09-12
申请号:US17313040
申请日:2021-05-06
申请人: NICE LTD.
发明人: Natan Katz , Tal Haguel
摘要: A system is provided to categorize voice prints during a voice authentication. The system includes a processor and a computer readable medium operably coupled thereto, to perform voice authentication operations which include receiving an enrollment of a user in the biometric authentication system, requesting a first voice print comprising a sample of a voice of the user, receiving the first voice print of the user during the enrollment, accessing a plurality of categorizations of the voice prints for the voice authentication, wherein each of the plurality of categorizations comprises a portion of the voice prints based on a plurality of similarity scores of distinct voice prints in the portion to a plurality of other voice prints, determining, using a hidden layer of a neural network, one of the plurality of categorizations for the first voice print, and encoding the first voice print with the one of the plurality of categorizations.
-
公开(公告)号:US11749296B2
公开(公告)日:2023-09-05
申请号:US17485644
申请日:2021-09-27
发明人: Chung-Shih Chu , Ming-Tang Lee , Chieh-Min Tsai
IPC分类号: G10L21/057 , G10L17/06 , G10L21/0232 , H04R1/40 , H04R3/00 , G06N20/00 , G10L21/0216
CPC分类号: G10L21/057 , G10L17/06 , G10L21/0232 , H04R1/406 , H04R3/005 , G06N20/00 , G10L2021/02166
摘要: A voice capturing method includes following operations: storing, by a buffer, voice data from a plurality of microphones; determining, by a processor, whether a target speaker exists and whether a direction of the target speaker changes according to the voice data and target speaker information; inserting a voice segment corresponding to a previous tracking direction into a current position in the voice data to generate fusion voice data when the target speaker exists and the direction of the target speaker changes from the previous tracking direction to a current tracking direction; performing, by the processor, a voice enhancement process on the fusion voice data according to the current tracking direction to generate enhanced voice data; performing, by the processor, a voice shortening process on the enhanced voice data to generate voice output data; and playing, by a playing circuit, the voice output data.
-
公开(公告)号:US11749267B2
公开(公告)日:2023-09-05
申请号:US16953510
申请日:2020-11-20
申请人: Google LLC
发明人: Aleksandar Kracun , Matthew Sharifi
CPC分类号: G10L15/22 , G10L15/197 , G10L15/30 , G10L17/06 , G10L17/24 , G10L2015/088 , G10L2015/223
摘要: A method for adapting hotword recognition includes receiving audio data characterizing a hotword event detected by a first stage hotword detector in streaming audio captured by a user device. The method also includes processing, using a second stage hotword detector, the audio data to determine whether a hotword is detected by the second stage hot word detector in a first segment of the audio data. When the hotword is not detected by the second stage hotword detector, the method includes, classifying the first segment of the audio data as containing a negative hotword that caused a false detection of the hotword event in the streaming audio by the first stage hotword detector. Based on the first segment of the audio data classified as containing the negative hotword, the method includes updating the first stage hotword detector to prevent triggering the hotword event in subsequent audio data that contains the negative hotword.
-
-
-
-
-
-
-
-
-