-
公开(公告)号:US12119005B2
公开(公告)日:2024-10-15
申请号:US18323496
申请日:2023-05-25
发明人: Yi Gao
IPC分类号: G10L15/22 , G10L17/02 , G10L17/06 , G10L17/20 , G10L17/22 , G10L21/0208 , G10L21/0232 , G10L25/18 , H04R1/40 , H04R3/00 , G10L21/0216
CPC分类号: G10L17/20 , G10L15/22 , G10L17/02 , G10L17/06 , G10L17/22 , G10L21/0232 , G10L25/18 , H04R1/406 , H04R3/005 , G10L2021/02082 , G10L2021/02166
摘要: An audio data processing method is provided. The method includes: obtaining multi-path audio data in an environmental space, obtaining a speech data set based on the multi-path audio data, and separately generating, in a plurality of enhancement directions, enhanced speech information corresponding to the speech data set; matching a speech hidden feature in the enhanced speech information with a target matching word, and determining an enhancement direction corresponding to the enhanced speech information having a highest degree of matching with the target matching word as a target audio direction; obtaining speech spectrum features in the enhanced speech information, and obtaining, from the speech spectrum features, a speech spectrum feature in the target audio direction; and performing speech authentication on the speech hidden feature and the speech spectrum feature that are in the target audio direction based on the target matching word, to obtain a target authentication result.
-
公开(公告)号:US12067989B2
公开(公告)日:2024-08-20
申请号:US17600231
申请日:2020-03-30
发明人: Joon-Hyuk Chang , Joonyoung Yang
摘要: Presented are a combined learning method and device using a transformed loss function and feature enhancement based on a deep neural network for speaker recognition that is robust in a noisy environment. A combined learning method using a transformed loss function and feature enhancement based on a deep neural network, according to one embodiment, can comprise the steps of: learning a feature enhancement model based on a deep neural network; learning a speaker feature vector extraction model based on the deep neural network; connecting an output layer of the feature enhancement model with an input layer of the speaker feature vector extraction model; and considering the connected feature enhancement model and speaker feature vector extraction model as one mode and performing combined learning for additional learning.
-
公开(公告)号:US20240144936A1
公开(公告)日:2024-05-02
申请号:US17974674
申请日:2022-10-27
发明人: DUSHYANT SHARMA
IPC分类号: G10L17/20 , G10L17/02 , G10L21/028 , H04S3/00
CPC分类号: G10L17/20 , G10L17/02 , G10L21/028 , H04S3/008 , H04S2400/01 , H04S2400/15
摘要: A method, computer program product, and computing system for receiving a signal from a single microphone. A plurality of modified signals may be generated from the single microphone signal, where the plurality of modified signals include at least one of: a speaker-specific signal, an acoustic parameter-specific signal, and a speech enhanced signal. Speech processing may be performed on the plurality of modified signals.
-
公开(公告)号:US11961522B2
公开(公告)日:2024-04-16
申请号:US17296806
申请日:2019-11-22
发明人: Chanwoo Kim , Dhananjaya N. Gowda , Sungsoo Kim , Minkyu Shin , Larry Paul Heck , Abhinav Garg , Kwangyoun Kim , Mehul Kumar
摘要: The disclosure relates to an electronic apparatus for recognizing user voice and a method of recognizing, by the electronic apparatus, the user voice. According to an embodiment, the method of recognizing the user voice includes obtaining an audio signal segmented into a plurality of frame units, determining an energy component for each filter bank by applying a filter bank distributed according to a preset scale to a frequency spectrum of the audio signal segmented into the frame units, smoothing the determined energy component for each filter bank, extracting a feature vector of the audio signal based on the smoothed energy component for each filter bank, and recognizing the user voice in the audio signal by inputting the extracted feature vector to a voice recognition model.
-
公开(公告)号:US11837214B1
公开(公告)日:2023-12-05
申请号:US17084513
申请日:2020-10-29
IPC分类号: G06F16/683 , G06F40/226 , G06F40/268 , G10L15/01 , G10L15/18 , G10L15/26 , G10L17/20
CPC分类号: G10L15/01 , G06F16/685 , G06F40/226 , G06F40/268 , G10L15/1822 , G10L15/26 , G10L17/20
摘要: Various embodiments of the present disclosure evaluate transcription accuracy. In some implementations, the system normalizes a first transcription of an audio file and a baseline transcription of the audio file. The baseline transcription can be used as an accurate transcription of the audio file. The system can further determine an error rate of the first transcription by aligning each portion of the first transcription with the portion of the baseline transcription, and assigning a label to each portion based on a comparison of the portion of the first transcription with the portion of the baseline transcription.
-
公开(公告)号:US20230238002A1
公开(公告)日:2023-07-27
申请号:US17999403
申请日:2021-05-28
发明人: MASATO HIRANO
IPC分类号: G10L17/02 , G10L17/18 , G10L21/0208 , G10L17/20 , G10L17/06
CPC分类号: G10L17/02 , G10L17/18 , G10L21/0208 , G10L17/20 , G10L17/06
摘要: For example, the accuracy of voice recognition is improved.
A signal processing device includes: a single speech detection unit that detects whether one channel of an input voice signal is a speech of a single speaker; a cluster information updating unit that updates cluster information based on a voice feature quantity when the input voice signal is a speech of a single speaker; a voice segment detection unit that detects a speech segment of a target speaker based on the cluster information; and a voice extraction unit that extracts only the voice signal of the target speaker from a mixed voice signal containing the voice of the target speaker.-
公开(公告)号:US11657823B2
公开(公告)日:2023-05-23
申请号:US17107496
申请日:2020-11-30
发明人: Elie Khoury , Matthew Garland
IPC分类号: G10L17/20 , G10L17/02 , G10L17/04 , G10L17/18 , G10L19/028
CPC分类号: G10L17/20 , G10L17/02 , G10L17/04 , G10L17/18 , G10L19/028
摘要: A system for generating channel-compensated features of a speech signal includes a channel noise simulator that degrades the speech signal, a feed forward convolutional neural network (CNN) that generates channel-compensated features of the degraded speech signal, and a loss function that computes a difference between the channel-compensated features and handcrafted features for the same raw speech signal. Each loss result may be used to update connection weights of the CNN until a predetermined threshold loss is satisfied, and the CNN may be used as a front-end for a deep neural network (DNN) for speaker recognition/verification. The DNN may include convolutional layers, a bottleneck features layer, multiple fully-connected layers and an output layer. The bottleneck features may be used to update connection weights of the convolutional layers, and dropout may be applied to the convolutional layers.
-
公开(公告)号:US20180301142A1
公开(公告)日:2018-10-18
申请号:US15949145
申请日:2018-04-10
CPC分类号: G10L15/063 , G06F3/04842 , G10L15/20 , G10L17/04 , G10L17/20 , G10L25/84 , G10L2015/0638 , H04W88/02
摘要: A method on a mobile device for voice recognition training is described. A voice training mode is entered. A voice training sample for a user of the mobile device is recorded. The voice training mode is interrupted to enter a noise indicator mode based on a sample background noise level for the voice training sample and a sample background noise type for the voice training sample. The voice training mode is returned to from the noise indicator mode when the user provides a continuation input that indicates a current background noise level meets an indicator threshold value.
-
公开(公告)号:US10008209B1
公开(公告)日:2018-06-26
申请号:US15273830
申请日:2016-09-23
发明人: Yao Qian , Jidong Tao , David Suendermann-Oeft , Keelan Evanini , Alexei V. Ivanov , Vikram Ramanarayanan
摘要: Systems and methods are provided for providing voice authentication of a candidate speaker. Training data sets are accessed, where each training data set comprises data associated with a training speech sample of a speaker and a plurality of speaker metrics, where the plurality of speaker metrics include a native language of the speaker. The training data sets are used to train a neural network, where the data associated with each training speech sample is a training input to the neural network, and each of the plurality of speaker metrics is a training output to the neural network. Data associated with a speech sample is provided to the neural network to generate a vector that contains values for the plurality of speaker metrics, and the values contained in the vector are compared to values contained in a reference vector associated with a known person to determine whether the candidate speaker is the known person.
-
公开(公告)号:US20180082692A1
公开(公告)日:2018-03-22
申请号:US15709024
申请日:2017-09-19
发明人: Elie KHOURY , Matthew GARLAND
IPC分类号: G10L17/20 , G10L17/18 , G10L17/04 , G10L17/02 , G10L19/028
CPC分类号: G10L17/20 , G10L17/02 , G10L17/04 , G10L17/18 , G10L19/028
摘要: A system for generating channel-compensated features of a speech signal includes a channel noise simulator that degrades the speech signal, a feed forward convolutional neural network (CNN) that generates channel-compensated features of the degraded speech signal, and a loss function that computes a difference between the channel-compensated features and handcrafted features for the same raw speech signal. Each loss result may be used to update connection weights of the CNN until a predetermined threshold loss is satisfied, and the CNN may be used as a front-end for a deep neural network (DNN) for speaker recognition/verification. The DNN may include convolutional layers, a bottleneck features layer, multiple fully-connected layers and an output layer. The bottleneck features may be used to update connection weights of the convolutional layers, and dropout may be applied to the convolutional layers.
-
-
-
-
-
-
-
-
-