摘要:
It is an object of the invention to remove an additive noise due to an influence or the like by an ambient circumstance in a real-time manner in order to improve precision of a speech recognition which is performed in a real-time manner. For this purpose, a converting process into a noise superimposed speech model is selectively performed to a speech model of a phoneme which is specified by a searching state and a recognition grammar during the search of a speech and can become a search target. A likelihood calculation for recognition of an input speech is executed by using the noise superimposed speech model formed by the converting process.
摘要:
The present invention aims to provide a high-speed speech recognition method of a high recognition rate, utilizing speaker models. For this purpose, the method of this invention executes an acoustic process on the input speech, then calculates a coarse output probability utilizing an unspecified speaker model, and calculates a fine output probability utilizing an unspecified speaker model and clustered speaker models, for the states estimated, by the result of coarse calculation, to contribute to the result of recognition. Candidates of recognition are extracted by a common language search based on the obtained result, and a fine language search is conducted on thus extracted candidates to determine the result of recognition.
摘要:
Speech including a speech portion and a non-speech portion is inputted, a Cepstrum long time mean of the speech portion is obtained from the speech portion of the input speech, a Cepstrum long time mean of the non-speech portion is obtained from the non-speech portion of the input speech, each Cepstrum long time mean is converted from a Cepstrum region to a linear region, and after that, it is subtracted on a linear spectrum dimension, the subtracted mean is converted into a Cepstrum dimension, a Cepstrum long time mean of a speech portion in a speech database for learning is subtracted from the converted result, and the subtracted result is added to a speech model expressed by Cepstrum. Thus, even when a noise is large, a presuming precision of a line fluctuation is raised and a recognition rate can be improved.
摘要:
The invention intends to successively extract a proper speech zone from a speech inputted in such a fashion that noise is mixed in a speech to be recognized, and to remove noise from the detected speech zone. To this end, a noise position is estimated from an input waveform, a speech zone is detected from a speech inputted subsequently by using power information of a speech at the estimated noise position, and noise is removed from the speech in the detected speech zone by using spectrum information of the speech at the estimated noise position. Further, the estimated noise zone is updated as appropriate by using a result of comparison between the power information of the input speech and the power information of the speech in the estimated noise zone so that the noise position is always properly estimated.
摘要:
It is an object of the invention to eliminate an influence by line characteristics in a real-time manner in order to raise recognizing precision of an input speech and to enable the speech to be recognized in a real-time manner. For this purpose, an estimate value of a long-time mean of a parameter is obtained from speech feature parameters which are sequentially inputted by using the speech feature parameters which have already been inputted, and the speech feature parameter inputted at that time point is normalized by using the obtained estimate value. Each time the speech feature parameter is inputted, the latest estimate value is obtained by using the already inputted parameters including the inputted speech feature parameter, and the latest input speech feature parameter is normalized by using the updated estimate value. Since the reliability of the estimate value is higher as the number of speech feature parameters used when the estimate value is obtained is larger, the estimate value is normalized by adding a weight in accordance with the reliability.