摘要:
A driving directions system loads into memory a limited subset of prerecorded, spoken utterances of geographic names from a mass media storage. The subset of spoken utterances may be limited, for example, to the geographic names within a predetermined radius (e.g., a few miles) of the driver's present location. The present location of the driver may be manually entered into the driving directions system by the driver, or automatically determined using a global positioning system (“GPS”) receiver. As the vehicle moves from its present location, the driving directions system loads into memory new names from the mass media storage and overwrites, if necessary, those which are now geographically out of range. Based on the current location of the driving, the driving directions system can audibly output geographic names from the run-time memory.
摘要:
Systems and methods for speech synthesis and, in particular, text-to-speech systems and methods for converting a text input to a synthetic waveform by processing prosodic and phonetic content of a spoken example of the text input to accurately mimic the input speech style and pronunciation. Systems and methods provide an interface to a TTS system to allow a user to input a text string and a spoken utterance of the text string, extract prosodic parameters from the spoken input, and process the prosodic parameters to derive corresponding markup for the text input to enable a more natural sounding synthesized speech.
摘要:
Examples of paralinguistic events (e.g., breaths, coughs, sighs, etc.) are recorded. A text-to-speech (“TTS”) engine may insert the examples into a stream of synthetic speech using, for example, markup. The synthetic speech may include a combination of normal text and paralinguistic text.
摘要:
A driving directions system loads into memory a limited subset of prerecorded, spoken utterances of geographic names from a mass media storage. The subset of spoken utterances may be limited, for example, to the geographic names within a predetermined radius (e.g., a few miles) of the driver's present location. The present location of the driver may be manually entered into the driving directions system by the driver, or automatically determined using a global positioning system (“GPS”) receiver. As the vehicle moves from its present location, the driving directions system loads into memory new names from the mass media storage and overwrites, if necessary, those which are now geographically out of range. Based on the current location of the driving, the driving directions system can audibly output geographic names from the run-time memory.
摘要:
A driving directions system loads into memory a limited subset of prerecorded, spoken utterances of geographic names from a mass media storage. The subset of spoken utterances may be limited, for example, to the geographic names within a predetermined radius (e.g., a few miles) of the driver's present location. The present location of the driver may be manually entered into the driving directions system by the driver, or automatically determined using a global positioning system (“GPS”) receiver. As the vehicle moves from its present location, the driving directions system loads into memory new names from the mass media storage and overwrites, if necessary, those which are now geographically out of range. Based on the current location of the driving, the driving directions system can audibly output geographic names from the run-time memory.
摘要:
Converting marked-up text into a synthesized stream includes providing marked-up text to a processor-based system, converting the marked-up text into a text stream including vocabulary items, retrieving audio segments corresponding to the vocabulary items, concatenating the audio segments to form a synthesized stream, and audibly outputting the synthesized stream, wherein the marked-up text includes a normal text and a paralinguistic text; and wherein the normal text is differentiated from the paralinguistic text by using a grammar constraint, and wherein the paralinguistic text is associated with more than one audio segment, wherein the retrieving of the plurality audio segments includes selecting one audio segment associated with the paralinguistic text.
摘要:
A method and computer program product for providing paraphrasing in a text-to-speech (TTS) system is provided. The method includes receiving an input text, parsing the input text, and determining a paraphrase of the input text. The method also includes synthesizing the paraphrase into synthesized speech. The method further includes selecting synthesized speech to output, which includes: assigning a score to each synthesized speech associated with each paraphrase, comparing the score of each synthesized speech associated with each paraphrase, and selecting the top-scoring synthesized speech to output. Furthermore, the method includes outputting the selected synthesized speech.
摘要:
A technique for producing speech output in an automatic dialog system is provided. Communication is received from a user at the automatic dialog system. A context of the communication from the user is detected in a context detector of the automatic dialog system. A message is provided to the user from a text-to-speech system of the automatic dialog system in communication with the context detector, wherein the message is provided in accordance with the detected context of the communication.
摘要:
A method, apparatus and a computer program product to generate an audible speech word that corresponds to text. The method includes providing a text word and, in response to the text word, processing pre-recorded speech segments that are derived from a plurality of speakers to selectively concatenate together speech segments based on at least one cost function to form audio data for generating an audible speech word that corresponds to the text word. A data structure is also provided for use in a concatenative text-to-speech system that includes a plurality of speech segments derived from a plurality of speakers, where each speech segment includes an associated attribute vector each of which is comprised of at least one attribute vector element that identifies the speaker from which the speech segment was derived.
摘要:
A technique for producing speech output in a text-to-speech system is provided. A message is created for communication to a user in a natural language generator of the text-to-speech system. The message is annotated in the natural language generator with a synthetic speech output style. The message is conveyed to the user through a speech synthesis system in communication with the natural language generator, wherein the message is conveyed in accordance with the synthetic speech output style.