摘要:
A sound source model storage section stores a sound source model that represents an audio signal emitted from a sound source in the form of a probability density function. An observation signal, which is obtained by collecting the audio signal, is converted into a plurality of frequency-specific observation signals each corresponding to one of a plurality of frequency bands. Then, a dereverberation filter corresponding to each frequency band is estimated by using the frequency-specific observation signal for the frequency band on the basis of the sound source model and a reverberation model that represents a relationship for each frequency band among the audio signal, the observation signal and the dereverberation filter. A frequency-specific target signal corresponding to each frequency band is determined by applying the dereverberation filter for the frequency band to the frequency-specific observation signal for the frequency band, and the resulting frequency-specific target signals are integrated.
摘要:
A sound source model storage section stores a sound source model that represents an audio signal emitted from a sound source in the form of a probability density function. An observation signal, which is obtained by collecting the audio signal, is converted into a plurality of frequency-specific observation signals each corresponding to one of a plurality of frequency bands. Then, a dereverberation filter corresponding to each frequency band is estimated by using the frequency-specific observation signal for the frequency band on the basis of the sound source model and a reverberation model that represents a relationship for each frequency band among the audio signal, the observation signal and the dereverberation filter. A frequency-specific target signal corresponding to each frequency band is determined by applying the dereverberation filter for the frequency band to the frequency-specific observation signal for the frequency band, and the resulting frequency-specific target signals are integrated.
摘要:
A model application unit calculates linear prediction coefficients of a multi-step linear prediction model by using discrete acoustic signals. Then, a late reverberation predictor calculates linear prediction values obtained by substituting the linear prediction coefficients and the discrete acoustic signals into linear prediction term of the multi-step linear prediction model, as predicted late reverberations. Next, a frequency domain converter converts the discrete acoustic signals to discrete acoustic signals in the frequency domain and also converts the predicted late reverberations to predicted late reverberations in the frequency domain. A late reverberation eliminator calculates relative values between the amplitude spectra of the discrete acoustic signals expressed in the frequency domain and the amplitude spectra of the predicted late reverberations expressed in the frequency domain, and provides the relative values as predicted amplitude spectra of a dereverberation signal.
摘要:
A model application unit calculates linear prediction coefficients of a multi-step linear prediction model by using discrete acoustic signals. Then, a late reverberation predictor calculates linear prediction values obtained by substituting the linear prediction coefficients and the discrete acoustic signals into linear prediction term of the multi-step linear prediction model, as predicted late reverberations. Next, a frequency domain converter converts the discrete acoustic signals to discrete acoustic signals in the frequency domain and also converts the predicted late reverberations to predicted late reverberations in the frequency domain. A late reverberation eliminator calculates relative values between the amplitude spectra of the discrete acoustic signals expressed in the frequency domain and the amplitude spectra of the predicted late reverberations expressed in the frequency domain, and provides the relative values as predicted amplitude spectra of a dereverberation signal.
摘要:
The initial values of parameter estimates are set, including reverberation parameter estimates, which includes a regression coefficient used in a linear convolutional operation for calculating an estimated value of reverberation included in an observed signal, source parameter estimates, which includes estimated values of a linear prediction coefficient and a prediction residual power that identify the power spectrum of a source signal, and noise parameter estimates, which include noise power spectrum estimates. Then, the maximum likelihood estimation is used to alternately repeat processing for updating at least one of the reverberation parameter estimates and the noise parameter estimates and processing for updating the source parameter estimates until a predetermined termination condition is satisfied.
摘要:
The initial values of parameter estimates are set, including reverberation parameter estimates, which includes a regression coefficient used in a linear convolutional operation for calculating an estimated value of reverberation included in an observed signal, source parameter estimates, which includes estimated values of a linear prediction coefficient and a prediction residual power that identify the power spectrum of a source signal, and noise parameter estimates, which include noise power spectrum estimates. Then, the maximum likelihood estimation is used to alternately repeat processing for updating at least one of the reverberation parameter estimates and the noise parameter estimates and processing for updating the source parameter estimates until a predetermined termination condition is satisfied.
摘要:
Speech dereverberation is achieved by accepting an observed signal for initialization (1000) and performing likelihood maximization (2000) which includes Fourier Transforms (4000).
摘要:
A method for producing a substrate with through-electrode includes the steps of: forming recesses or through-holes in either one of a silicon substrate and a glass substrate; forming protrusions in the other substrate; laying the silicon substrate and glass substrate on each other so that the protrusions are inserted in the respective recesses or through-holes; and bonding the silicon substrate and the glass substrate to each other.
摘要:
The processing efficiency and estimation accuracy of a voice activity detection apparatus are improved. An acoustic signal analyzer receives a digital acoustic signal containing a speech signal and a noise signal, generates a non-speech GMM and a speech GMM adapted to a noise environment, by using a silence GMM and a clean-speech GMM in each frame of the digital acoustic signal, and calculates the output probabilities of dominant Gaussian distributions of the GMMs. A speech state probability to non-speech state probability ratio calculator calculates a speech state probability to non-speech state probability ratio based on a state transition model of a speech state and a non-speech state, by using the output probabilities; and a voice activity detection unit judges, from the speech state probability to non-speech state probability ratio, whether the acoustic signal in the frame is in the speech state or in the non-speech state and outputs only the acoustic signal in the speech state.
摘要:
The processing efficiency and estimation accuracy of a voice activity detection apparatus are improved. An acoustic signal analyzer receives a digital acoustic signal containing a speech signal and a noise signal, generates a non-speech GMM and a speech GMM adapted to a noise environment, by using a silence GMM and a clean-speech GMM in each frame of the digital acoustic signal, and calculates the output probabilities of dominant Gaussian distributions of the GMMs. A speech state probability to non-speech state probability ratio calculator calculates a speech state probability to non-speech state probability ratio based on a state transition model of a speech state and a non-speech state, by using the output probabilities; and a voice activity detection unit judges, from the speech state probability to non-speech state probability ratio, whether the acoustic signal in the frame is in the speech state or in the non-speech state and outputs only the acoustic signal in the speech state.