Abstract:
Изобретение относится к области голосовой биометрии, в частности к задаче автоматической оценки голосовых моделей дикторов по записям их телефонных переговоров с автоматической привязкой голосовой модели диктора к номеру телефона. Способ получения голосовой модели целевого диктора, согласно которому осуществляют сегментацию по голосам дикторов по меньшей мере двух фонограмм телефонных переговоров с получением сегментов речи; строят голосовые модели дикторов по полученным сегментам речи; осуществляют кластеризацию построенных голосовых моделей дикторов с использованием метаданных телефонных переговоров с получением кластеров; определяют связи между кластерами на основании фонограмм телефонных переговоров; и выделяют кластер с наибольшим количеством связей как кластер целевого диктора. Также предложено устройство для получения голосовой модели целевого диктора.
Abstract:
A device includes a sound acquisition manager configured to receive a mixed audio signal including a first plurality of audio signals, an independent component analysis manager configured to determine a set of parameters configured to generate a second plurality of audio signals based on the first plurality of audio signals, and to minimize a correlation between pairs of signals of the converted second plurality of audio signals, and a memory configured to store the second plurality of audio signals as multi-channel audio data.
Abstract:
생성적 대립 망 기반의 음성 대역폭 확장기 및 확장 방법이 제시된다. 일 실시예에 따른 생성적 대립 망 기반의 음성 대역폭 확장 방법은, 음성의 협대역(Narrowband, NB) 신호와 광대역(Wideband, WB) 신호에서 특징벡터를 추출하는 단계; 상기 협대역 신호의 특징벡터로부터 광대역 신호의 특징벡터를 추정하는 단계; 및 추출된 실제 상기 광대역 신호의 특징벡터와 상기 협대역 신호의 특징벡터로부터 추정된 광대역 신호의 특징벡터를 판별하는 심화 신경망 분류 모델을 학습시키는 단계를 포함하여 이루어질 수 있다.
Abstract:
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating word pronunciations. One of the methods includes determining, by one or more computers, spelling data that indicates the spelling of a word, providing the spelling data as input to a trained recurrent neural network, the trained recurrent neural network being trained to indicate characteristics of word pronunciations based at least on data indicating the spelling of words, receiving output indicating a stress pattern for pronunciation of the word generated by the trained recurrent neural network in response to providing the spelling data as input, using the output of the trained recurrent neural network to generate pronunciation data indicating the stress pattern for a pronunciation of the word, and providing, by the one or more computers, the pronunciation data to a text-to-speech system or an automatic speech recognition system.
Abstract:
The technology described herein uses a multiple-output layer RNN to process an acoustic signal comprising speech from multiple speakers to trace an individual speaker's speech. The multiple-output layer RNN has multiple output layers, each of which is meant to trace one speaker (or noise) and represent the mask for that speaker (or noise). The output layer for each speaker (or noise) can have the same dimensions and can be normalized for each output unit across all output layers. The rest of the layers in the multiple-output layer RNN are shared across all the output layers. The result from the previous frame is used as input to the output layer or to one of the hidden layers of the RNN to calculate results for the current frame. This pass back of results allows the model to carry information from previous frames to future frames to trace the same speaker.
Abstract:
Examples described herein involve detecting known impairments or other known conditions using a neural network. An example implementation involves receiving data indicating a response of a playback device as captured by a microphone. The implementation also involves determining an input vector by projecting a response vector that represents the response of the playback device onto a principle component matrix representing variance caused by one or more known impairments. The implementation further involves providing the determined input vector to a neural network that includes an output layer comprising neurons that correspond to respective known impairments. The implementation involves detecting that the input vector caused one or more neurons of the neural network to fire such that the neural network indicates that a particular known impairment is affecting the microphone and/or the playback device and adjusting operation of the playback device and/or the microphone to offset the particular known impairment.
Abstract:
In a system and method for assessing the condition of a subject, control parameters are derived from a neurophysiological computational model that operates on features extracted from a speech signal. The control parameters are used as biomarkers (indicators) of the subject's condition. Speech related features are compared with model predicted speech features, and the error signal is used to update control parameters within the neurophysiological computational model. The updated control parameters are processed in a comparison with parameters associated with the disorder in a library.