摘要:
A speech synthesis system is disclosed that utilizes a pitch contour resulting in a more natural-sounding speech. The present invention modifies the predicted pitch, b(t), for synthesized speech using a low frequency energy booster. The low frequency energy booster interpolates the discrete pitch values, if necessary, and increase the amount of energy of the pitch contour associated with low frequency values, such as all frequency values below 10 Hertz. The amount of energy of the pitch contour associated with low frequency values can be increased, for example, by adding band-limited noise (a carrier signal) to the pitch contour, b(t), or by filtering the pitch values with an impulse response filter having a pole at the desired low frequency value. The present invention serves to add vibrato to the to the original pitch contour, b(t), and thereby improves the naturalness of the synthetic waveform.
摘要:
A characteristic-specific digitization method and apparatus are disclosed that reduces the error rate in converting input information into a computer-readable format. The input information is analyzed and subsets of the input information are classified according to whether the input information exhibits a specific physical parameter affecting recognition accuracy. If the input information exhibits the specific physical parameter affecting recognition accuracy, the characteristic-specific digitization system recognizes the input information using a characteristic-specific recognizer that demonstrates improved performance for the given physical parameter. If the input information does not exhibit the specific physical parameter affecting recognition accuracy, the characteristic-specific digitization system recognizes the input information using a general recognizer that performs well for typical input information. In one implementation, input speech having very low recognition accuracy as a result of a physical speech characteristic is automatically identified and recognized using a characteristic-specific speech recognizer.
摘要:
A technique for producing speech output in a text-to-speech system is provided. A message is created for communication to a user in a natural language generator of the text-to-speech system. The message is annotated in the natural language generator with a synthetic speech output style. The message is conveyed to the user through a speech synthesis system in communication with the natural language generator, wherein the message is conveyed in accordance with the synthetic speech output style.
摘要:
A technique for producing speech output in an automatic dialog system in accordance with a detected context is provided. Communication is received from a user at the automatic dialog system. A context of the communication from the user is detected in a context detector of the automatic dialog system. A message is created in a natural language generator of the automatic dialog system in communication with the context detector. The message is conveyed to the user through a speech synthesis system of the automatic dialog system, in communication with the natural language generator and the context detector. Responsive to a detected level of ambient noise, the context detector provides at least one command in a markup language to cause the natural language generator to create the message using maximally intelligible words and to cause the speech synthesis system to convey the message with increased volume and decreased speed.
摘要:
A fast vocabulary independent method for spotting words in speech utilizes a preprocessing step and a coarse-to-detailed search strategy for spotting a word/phone sequence in speech. The preprocessing includes a Viterbi-beam phone level decoding using a tree-based phone language model. The coarse search matches phone-ngrams to identify regions of speech as putative word hits, and the detailed search performs an acoustic match at the putative hits with a model of the given word included in the vocabulary of the recognizer.