摘要:
A technique for producing speech output in a text-to-speech system is provided. A message is created for communication to a user in a natural language generator of the text-to-speech system. The message is annotated in the natural language generator with a synthetic speech output style. The message is conveyed to the user through a speech synthesis system in communication with the natural language generator, wherein the message is conveyed in accordance with the synthetic speech output style.
摘要:
A technique for masking latency in an automatic dialog system is provided. A communication is received from a user at the automatic dialog system. The communication is processed in the automatic dialog system to provide a response. At least one transitional message is provided to the user from the automatic dialog system while processing the communication. A response is provided to the user from the automatic dialog system in accordance with the received communication from the user.
摘要:
A method and apparatus for using a tree structure to constrain a time-synchronous, fast search for candidate words in an acoustic stream is described. A minimum stay of three frames in each graph node visited is imposed by allowing transitions only every third frame. This constraint enables the simplest possible Markov model for each phoneme while enforcing the desired minimum duration. The fast, time-synchronous search for likely words is done for an entire sentence/utterance. The list of hypotheses beginning at each time frame is stored for providing, on-demand, lists of contender/candidate words to the asynchronous, detailed match phase of decoding.
摘要:
A method (and system) of determining confusable list items and resolving this confusion in a spoken dialog system includes receiving user input, processing the user input and determining if a list of items needs to be played back to the user, retrieving the list to be played back to the user, identifying acoustic confusions between items on the list, changing the items on the list as necessary to remove the acoustic confusions, and playing unambiguous list items back to the user.
摘要:
A computer implemented method, system and computer usable program code for synthesizing speech. A computer implemented method for synthesizing speech includes providing a database of speech of a source speaker, and providing a prosody model of speech of a target speaker different from the source speaker. Text input to be synthesized is received, and the prosody model of speech of the target speaker is applied to the text input to select segments of the speech of the source speaker in the database to form synthesized speech of the text input. The synthesized speech of the text input is then output.
摘要:
A technique for masking latency in an automatic dialog system is provided. A communication is received from a user at the automatic dialog system. The communication is processed in the automatic dialog system to provide a response. At least one transitional message is provided to the user from the automatic dialog system while processing the communication. A response is provided to the user from the automatic dialog system in accordance with the received communication from the user.
摘要:
A method for generating a frequency warping function comprising preparing the training speech of a source and a target speaker; performing frame alignment on the training speech of the speakers; selecting aligned frames from the frame-aligned training speech of the speakers; extracting corresponding sets of formant parameters from the selected aligned frames; and generating a frequency warping function based on the corresponding sets of formant parameters. The step of selecting aligned frames preferably selects a pair of aligned frames in the middle of the same or similar frame-aligned phonemes with the same or similar contexts in the speech of the source speaker and target speaker. The step of generating a frequency warping function preferably uses the various pairs of corresponding formant parameters in the corresponding sets of formant parameters as key positions in a piecewise linear frequency warping function to generate the frequency warping function.
摘要:
A driving directions system loads into memory a limited subset of prerecorded, spoken utterances of geographic names from a mass media storage. The subset of spoken utterances may be limited, for example, to the geographic names within a predetermined radius (e.g., a few miles) of the driver's present location. The present location of the driver may be manually entered into the driving directions system by the driver, or automatically determined using a global positioning system (“GPS”) receiver. As the vehicle moves from its present location, the driving directions system loads into memory new names from the mass media storage and overwrites, if necessary, those which are now geographically out of range. Based on the current location of the driving, the driving directions system can audibly output geographic names from the run-time memory.
摘要:
A method (and system) of determining confusable list items and resolving this confusion in a spoken dialog system includes receiving user input, processing the user input and determining if a list of items needs to be played back to the user, retrieving the list to be played back to the user, identifying acoustic confusions between items on the list, changing the items on the list as necessary to remove the acoustic confusions, and playing unambiguous list items back to the user.
摘要:
A technique for producing speech output in a text-to-speech system is provided. A message is created for communication to a user in a natural language generator of the text-to-speech system. The message is annotated in the natural language generator with a synthetic speech output style. The message is conveyed to the user through a speech synthesis system in communication with the natural language generator, wherein the message is conveyed in accordance with the synthetic speech output style.