-
公开(公告)号:US12131740B2
公开(公告)日:2024-10-29
申请号:US18331920
申请日:2023-06-08
CPC分类号: G10L17/04 , G10L17/02 , G10L17/06 , H04M3/42221
摘要: Methods and systems are disclosed herein for improving the quality of audio for use in a biometric. A biometric system may use machine learning to determine whether audio or a portion of the audio should be used as a biometric for a user. A sample of the user's voice may be used to generate a voice signature of the user. Portions of the audio that do not meet a similarity threshold when compared with the voice signature may be removed from the audio. Additionally or alternatively, interfering noises may be detected and removed from the audio to improve the quality of a voice biometric generated from the audio.
-
公开(公告)号:US20240312487A1
公开(公告)日:2024-09-19
申请号:US18432914
申请日:2024-02-05
IPC分类号: G11B27/036 , G06V20/40 , G10L17/02 , G10L25/57 , G10L25/63
CPC分类号: G11B27/036 , G06V20/41 , G10L17/02 , G10L25/57 , G10L25/63
摘要: A multimedia data recording method includes performing real-time analysis on multimedia data that includes simultaneously collected first audio data and image frame data to obtain voice content and a demonstration action of a target object, determining semantic correlation between the demonstration action and the voice content, performing video understanding on an image frame recording the demonstration action to convert the demonstration action to second audio data in response to the semantic correlation indicating that a content indicated by the demonstration action is inconsistent with the voice content, and dynamically inserting the second audio data into the first audio data to update the multimedia data.
-
公开(公告)号:US20240312467A1
公开(公告)日:2024-09-19
申请号:US18596879
申请日:2024-03-06
申请人: ROHM CO., LTD.
发明人: Koji TAMANO , Takahiro NISHIYAMA
摘要: A voice authentication device for incorporation in an appliance including a voice conversion portion configured to convert voice from outside into a voice signal that is an electrical signal includes a voice registration portion configured to learn a parameter of an AI model based on the voice signal, a voice verification portion configured to perform voice verification on input data based on the voice signal in accordance with an inference result yielded by the AI model having the learned parameter. The voice authentication is performed based on the voice registration portion and the voice verification portion.
-
公开(公告)号:US20240303265A1
公开(公告)日:2024-09-12
申请号:US18279592
申请日:2021-03-01
发明人: Shota ORIHASHI , Masato SAWADA
CPC分类号: G06F16/353 , G10L15/26 , G10L17/02
摘要: A label assignment support device according to the present disclosure includes a preliminary label estimation unit that assigns preliminary labels for each of a plurality of elements, a label assignment work screen output unit that generates a label assignment work screen for each of the plurality of elements and an update operation for labels assigned to the plurality of elements by a user, the label assignment work screen indicating each of the plurality of elements and labels assigned to each of the plurality of elements in association with each other, and a label update unit that, when a label assigned to one of the elements is updated by the update operation via the label assignment work screen, assigns the label after update to the one of the elements.
-
公开(公告)号:US20240296211A1
公开(公告)日:2024-09-05
申请号:US18116776
申请日:2023-03-02
摘要: The present disclosure provide a multiple factor authentication process using text pass codes. A process performs a first verification of a user using an authentication credential transmitted via a first communication channel. Based on successfully performing the first verification, the process performs a second verification using a textual phrase transmitted to the user via a different communication channel. The words included in the textual phrase can be selected to avoid ambiguous pronunciations and spellings.
-
公开(公告)号:US20240265925A1
公开(公告)日:2024-08-08
申请号:US18165817
申请日:2023-02-07
申请人: Spotify AB
IPC分类号: G10L17/18 , G10L17/02 , G10L21/028
CPC分类号: G10L17/18 , G10L17/02 , G10L21/028
摘要: The various implementations described herein include methods and devices for identifying a language in audio content. In one aspect, a method includes obtaining audio content and generating a speaker embedding from the audio content. The method further includes determining, via a language identification model, a language of the audio content based on the speaker embedding.
-
公开(公告)号:US12051441B2
公开(公告)日:2024-07-30
申请号:US17944067
申请日:2022-09-13
发明人: Jimeng Zheng , Lianwu Chen , Weiwei Li , Zhiyi Duan , Meng Yu , Dan Su , Kaiyu Jiang
CPC分类号: G10L25/84 , G06T7/20 , G10L17/02 , G10L17/22 , G10L21/028 , G10L25/21 , G06T2207/30201
摘要: This application discloses a multi-sound area-based speech detection method and related apparatus, and a storage medium, which is applied to the field of artificial intelligence. The method includes: obtaining sound area information corresponding to N sound areas including multiple users speaking simultaneously; generating a control signal corresponding to each target detection sound area according to user information corresponding to the target detection sound area; processing multi-user speech input signals by using the control signals, to obtain a speech output signal corresponding to each target detection sound area; generating a speech detection result of the target detection sound area according to the speech output signal corresponding to the target detection sound area; and selecting, among the multiple users, a main speaker based on the user information, the speech output signals and speech detection results of multiple users in the N sound areas.
-
公开(公告)号:US12033614B2
公开(公告)日:2024-07-09
申请号:US17840787
申请日:2022-06-15
IPC分类号: G10L13/08 , G06F21/62 , G06F40/166 , G10L13/033 , G10L17/02
CPC分类号: G10L13/08 , G06F21/6245 , G06F40/166 , G10L13/033 , G10L17/02
摘要: A method, computer program product, and computing system for receiving an input speech signal. A transcription of the input speech signal may be received. A speaker embedding may be extracted from the input speech signal. Acoustic properties from the input speech signal may be extracted. An obscured transcription may be generated from the transcription, where the obscured transcription includes obscured representations of sensitive content from the transcription. An obscured speech signal may be generated based upon, at least in part, the extracted speaker embedding and the obscured transcription, where the obscured speech signal includes obscured representations of sensitive content from the input speech signal. The obscured speech signal may be augmented based upon, at least in part, the extracted acoustic properties.
-
公开(公告)号:US20240194206A1
公开(公告)日:2024-06-13
申请号:US18438225
申请日:2024-02-09
发明人: Oleksandra SOKOL , Dmytro PROGONOV , Heorhii NAUMENKO , Kostiantyn VOLOBUIEV , Vasyl KUZNETSOV , Viacheslav DERKACH
摘要: An electronic device includes a microphone, and at least one processor configured to, based on receiving voice data through the microphone, input the voice data into a non-semantic feature extractor model and acquire a non-semantic feature included in the voice data using the non-semantic feature extractor model, input the non-semantic feature into a synthetic voice classifier model and classify the voice data into a synthetic voice or a user voice the synthetic voice classifier model, and provide a result of the classification, and the synthetic voice classifier model is a model that is transfer-learned based on the non-semantic feature extractor model.
-
公开(公告)号:US12002473B2
公开(公告)日:2024-06-04
申请号:US17617314
申请日:2020-12-24
发明人: Yuechao Guo , Yixuan Qiao , Yijun Tang , Jun Wang , Peng Gao , Guotong Xie
摘要: A voiceprint recognition method includes: obtaining a target speech information set to be recognized that includes speech information corresponding to at least one object; extracting target feature information from the target speech information set by using a preset algorithm, and optimizing the target feature information based on a first loss function to obtain a first voiceprint recognition result; obtaining target speech channel information of a target speech channel, where the target speech channel information includes channel noise information, and the target speech channel is used to transmit the target speech information set; extracting target feature vectors in the channel noise information, and optimizing the target feature vectors based on a second loss function to obtain a second voiceprint recognition result; and fusing the first voiceprint recognition result and the second voiceprint recognition result to determine a final voiceprint recognition result.
-
-
-
-
-
-
-
-
-