摘要:
A system and method enabling acoustic barge-in during a voice prompt in a communication system. An acoustic prompt model is trained to represent the system prompt using the specific speech signal of the prompt. The acoustic prompt model is utilized in a speech recognizer in parallel with the recognizer's active vocabulary words to suppress the echo of the prompt within the recognizer. The speech recognizer may also use a silence model and traditional garbage models such as noise models and out-of-vocabulary word models to reduce the likelihood that noises and out-of-vocabulary words in the user utterance will be mapped erroneously onto active vocabulary words.
摘要:
A system and method enabling acoustic barge-in during a voice prompt in a communication system. An acoustic prompt model is trained to represent the system prompt using the specific speech signal of the prompt. The acoustic prompt model is utilized in a speech recognizer in parallel with the recognizer's active vocabulary words to suppress the echo of the prompt within the recognizer. The speech recognizer may also use a silence model and traditional garbage models such as noise models and out-of-vocabulary word models to reduce the likelihood that noises and out-of-vocabulary words in the user utterance will be mapped erroneously onto active vocabulary words.
摘要:
Method and device for the recognition of words and pauses in a voice signal. The words (Wi) spoken in a row and pauses (Ti) are thereby combined as to be appertaining to a word group as soon as one of the pauses (Ti) exceeds a limit value (TG). Stored references (Rj) are allocated to the voice signal of the word group, and an indication of the result of the allocation is effected after the limit value (TG) has been exceeded. To this end, parameters corresponding to the moments of the transitions between ranges with voice and non-voice are determined from the voice signal, and the limit value (TG) is then changed in dependence on said parameters.
摘要:
The invention relates to an arrangement for communication with a subscriber. The arrangement includes a spectral analysis unit for producing short-time spectral values (Y(i)) of received signals (E) which signals are at times subscriber's speech signals superimposed by echoes of transmission signals (S) transmitted to the subscriber, a echo cancelling unit for estimating short-time spectral values of the echoes (X.sub.w (i)) and for producing difference values (D(i)) between the short-time spectral values (Y(i)) of the received signals (E) and the estimated short-time spectral values (X.sub.w (i)) of the echoes, and speech recognition unit for evaluating the difference values (D(i)).
摘要:
The invention relates to a method for generating an adapted reference for automatic speech recognition. In a first step, recognition is performed based on a spoken utterance and a recognition result which corresponds to a currently valid reference is obtained. In a second step, the currently valid reference is adapted in accordance with the utterance in order to create an adapted reference. In a third step, the adapted reference is assessed and it is decided if the adapted reference is used for further recognition.
摘要:
Method and device for the recognition of words and pauses in a voice signal. The words (Wi) spoken in a row and pauses (Ti) are thereby combined as to be appertaining to a word group as soon as one of the pauses (Ti) exceeds a limit value (TG). Stored references (Rj) are allocated to the voice signal of the word group, and an indication of the result of the allocation is effected after the limit value (TG) has been exceeded. To this end, parameters corresponding to the moments of the transitions between ranges with voice and non-voice are determined from the voice signal, and the limit value (TG) is then changed in dependence on said parameters.
摘要:
Speech recognition produces test signals from the speech signal which are compared with predetermined reference signals so as to form scores. Each subsequent test signal is compared with reference values which are situated within a predetermined neighborhood of the reference value which has been determined to be optimum for the preceding test signal. In dependence on this neighborhood, transition values in conformity with the transition probabilities are added to the scores. In order to enhance the results notably in the case of different speeds of speaking of the instantaneous speaker, it is proposed to adapt these transition values in dependence on the speed of speaking. A further improvement can be achieved by also adapting the reference values themselves to the relevant speaker's pronunciation. This adaptation can also be iteratively performed in a number of steps.