摘要:
Judgment result deriving means 74 makes a judgment between active voice and non-active voice every unit time for a time series of voice data in which the number of active voice segments and the number of non-active voice segments are already known as a number of the labeled active voice segment and a number of the labeled non-active voice segment and shapes active voice segments and non-active voice segments as the result of the judgment by comparing the length of each segment during which the voice data is consecutively judged to correspond to active voice by the judgment or the length of each segment during which the voice data is consecutively judged to correspond to non-active voice by the judgment with a duration threshold. Segments number calculating means 75 calculates the number of active voice segments and the number of non-active voice segments. Duration threshold updating means 76 updates the duration threshold so that the difference between the calculated number of active voice segments and the number of the labeled active voice segments decreases or the difference between the calculated number of non-active voice segments and the number of the labeled non-active voice segments decreases.
摘要:
An apparatus of this invention is a speech processing apparatus that acquires pseudo speech from a mixture sound including desired speech and noise. The speech processing apparatus includes a first microphone that inputs a first mixture sound including desired speech and noise and outputs a first mixture signal, a second microphone that is opened to the same sound space as that of said first microphone and disposed at a focus position of an interface that is part of a boundary of the sound space and has one of a quadratic surface shape and a pseudo surface shape approximating a quadratic surface, inputs a second mixture sound including the desired speech reflected by the interface and the noise reflected by the interface at a ratio different from the first mixture sound, and outputs a second mixture signal, and a noise suppression circuit that suppresses an estimated noise signal based on the first mixture signal and the second mixture signal and outputs a pseudo speech signal.
摘要:
The present invention can increase the types of noises that can be dealt with enough to enable speech recognition with a speech recognition rate of high accuracy.A speech recognition device of the present invention performs processes of: storing, in a manner to relate them to each other, a suppression coefficient representing a noise suppression amount and an adaptation coefficient representing an adaptation amount of a noise model, where the noise model is generated on the basis of a predetermined noise and is to be compounded (synthesized) to a clean acoustic model generated on the basis of a voice including no noise; estimating noise from an input signal; suppressing from the input signal a portion of the estimated noise of an amount specified by a suppression amount specified on the basis of the suppression coefficient; generating an adapted acoustic model which is noise-adapted, by compounding (synthesizing) the clean acoustic model with a noise model generated on the basis of the estimated noise in accordance with an adaptation amount specified on the basis of the adaptation coefficient; and recognizing voice on the basis of the noise-suppressed input signal and the generated adapted acoustic model.
摘要:
Provided is a voice recognition system capable of, while suppressing negative influences from sound not to be recognized, correctly estimating utterance sections that are to be recognized. A voice segmenting means calculates voice feature values, and segments voice sections or non-voice sections by comparing the voice feature values with a threshold value. Then, the voice segmenting means determines, to be first voice sections, those segmented sections or sections obtained by adding a margin to the front and rear of each of those segmented sections. On the basis of voice and non-voice likelihoods, a search means determines, to be second voice sections, sections to which voice recognition is to be applied. A parameter updating means updates the threshold value and the margin. The voice segmenting means determines the first voice sections by using the one of the threshold value and the margin which has been updated by the parameter updating means.
摘要:
A system for voice detection includes a feature value calculation unit that calculates a feature value from an input signal sliced on a per frame basis, a provisional voice/non-voice decision unit that provisionally decides a voiced interval and a non-voiced interval from the feature value calculated on a per frame basis, and a voice/non-voice decision unit that determines a voiced interval duration threshold value or a non-voiced interval duration threshold value, using a ratio of the feature value found on a per frame basis to a threshold value for the feature value and that re-decides the voiced interval and the non-voiced interval, using the voiced interval duration threshold value determined and the non-voiced interval duration threshold value determined. By determining the voiced interval duration threshold value and the non-voiced interval duration threshold value, using the feature value found on a per frame basis and the threshold value for the feature value, the constraint of the shaping rule may be made weaker, or stronger in case the feature value found on a per frame basis can be regarded as being reliable or not, thereby allowing voice detection to be made without dependency upon a noise environment.
摘要:
Disclosed is a noise suppression system including a unit for calculating a noise mean spectrum from an input signal, a unit for deriving the provisional estimate speech from the input signal and the noise mean spectrum, a reference speech pattern, and a unit for correcting the provisional estimate speech using the reference pattern.
摘要:
The present invention can increase the types of noises that can be dealt with enough to enable speech recognition with a speech recognition rate of high accuracy.A speech recognition device of the present invention performs processes of: storing, in a manner to relate them to each other, a suppression coefficient representing a noise suppression amount and an adaptation coefficient representing an adaptation amount of a noise model, where the noise model is generated on the basis of a predetermined noise and is to be compounded (synthesized) to a clean acoustic model generated on the basis of a voice including no noise; estimating noise from an input signal; suppressing from the input signal a portion of the estimated noise of an amount specified by a suppression amount specified on the basis of the suppression coefficient; generating an adapted acoustic model which is noise-adapted, by compounding (synthesizing) the clean acoustic model with a noise model generated on the basis of the estimated noise in accordance with an adaptation amount specified on the basis of the adaptation coefficient; and recognizing voice on the basis of the noise-suppressed input signal and the generated adapted acoustic model.
摘要:
Provided is a voice recognition system capable of, while suppressing negative influences from sound not to be recognized, correctly estimating utterance sections that are to be recognized. A voice segmenting means calculates voice feature values, and segments voice sections or non-voice sections by comparing the voice feature values with a threshold value. Then, the voice segmenting means determines, to be first voice sections, those segmented sections or sections obtained by adding a margin to the front and rear of each of those segmented sections. On the basis of voice and non-voice likelihoods, a search means determines, to be second voice sections, sections to which voice recognition is to be applied. A parameter updating means updates the threshold value and the margin. The voice segmenting means determines the first voice sections by using the one of the threshold value and the margin which has been updated by the parameter updating means.
摘要:
Disclosed is a gain control system in which speech model constituted from a sound pressure and a feature is stored in a speech model storage unit for each of a plurality of phonemes or for each of clusters into which a speech is divided. When an input signal is given, a feature conversion unit calculates a feature and a sound pressure of the input signal. A sound pressure comparison unit determines a sound pressure ratio between the input signal and each of speech models. A distance calculation unit calculates a distance between the feature of the input signal and the feature of each of the speech models. A gain calculation unit calculates a gain value from the sound pressure ratio and information on the distance. A sound pressure compensation unit thereby compensates for the sound pressure of the input signal.
摘要:
A voice recognition device that recognizes a voice of an input voice signal, comprises a voice model storage unit that stores in advance a predetermined voice model having a plurality of detail levels, the plurality of detail levels being information indicating a feature property of a voice for the voice model; a detail level selection unit that selects a detail level, closest to a feature property of an input voice signal, from the detail levels of the voice model stored in the voice model storage unit; and a parameter setting unit that sets parameters for recognizing the voice of an input voice according to the detail level selected by the detail level selection unit.