摘要:
A method and apparatus to use semantic inference with speech recognition systems includes recognizing at least one spoken word, processing the spoken word using a context-free grammar, deriving an output from the context-free grammar, and translating the output to a predetermined command.
摘要:
Methods and systems for providing graphical user interfaces are described. overlaid, Information-bearing windows whose contents remain unchanged for a predetermined period of time become translucent. The translucency can be graduated so that, over time, if the window's contents remain unchanged, the window becomes more translucent. In addition to visual translucency, windows according to the present invention also have a manipulative translucent quality. Upon reaching a certain level of visual translucency, user input in the region of the window is interpreted as an operation on the underlying objects rather than the contents of the overlaying window.
摘要:
Methods and systems for providing graphical user interfaces are described. Overlaid, information-bearing windows whose contents remain unchanged for a predetermined period of time become translucent. The translucency can be graduated so that, over time, if the window's contents remain unchanged, the window becomes more translucent. In addition to visual translucency, windows also have a manipulative translucent quality. Upon reaching a certain level of visual translucency, user input in the region of the window is interpreted as an operation on the underlying objects rather than the contents of the overlaying window.
摘要:
Methods and apparatus are described for providing automated operator services and in particular, a reverse directory assistance service. A calling customer is connected to an automated system that prompts the caller for a listing identifier which is used by the system to retrieve a textual listing corresponding to the listing identifier from a database of textual listings. The textual listing contains a TTS ID which identifies a particular one TTS device from a plurality of TTS devices and the listing is optionally preprocessed and parsed into a plurality of fields which define the listing. The listing text is then sent to the particular one TTS device for text to speech synthesis of the text contained within the listing. The method further includes teaching the system which one TTS device of the plurality of TTS devices, best synthesizes the text contained within the listing and then identifying that one TTS device within the listing so that subsequent synthesis will utilize that TTS device.
摘要:
Methods and systems for providing graphical user interfaces are described. overlaid, Information-bearing windows whose contents remain unchanged for a predetermined period of time become translucent. The translucency can be graduated so that, over time, if the window's contents remain unchanged, the window becomes more translucent. In addition to visual translucency, windows according to the present invention also have a manipulative translucent quality. Upon reaching a certain level of visual translucency, user input in the region of the window is interpreted as an operation on the underlying objects rather than the contents of the overlaying window.
摘要:
A method and an apparatus for improved duration modeling of phonemes in a speech synthesis system are provided. According to one aspect, text is received into a processor of a speech synthesis system. The received text is processed using a sum-of-products phoneme duration model that is used in either the formant method or the concatenative method of speech generation. The phoneme duration model, which is used along with a phoneme pitch model, is produced by developing a non-exponential functional transformation form for use with a generalized additive model. The non-exponential functional transformation form comprises a root sinusoidal transformation that is controlled in response to a minimum phoneme duration and a maximum phoneme duration. The minimum and maximum phoneme durations are observed in training data. The received text is processed by specifying at least one of a number of contextual factors for the generalized additive model. An inverse of the non-exponential functional transformation is applied to duration observations, or training data. Coefficients are generated for use with the generalized additive model. The generalized additive model comprising the coefficients is applied to at least one phoneme of the received text resulting in the generation of at least one phoneme having a duration. An acoustic sequence is generated comprising speech signals that are representative of the received text.
摘要:
A method and an apparatus for improved duration modeling of phonemes in a speech synthesis system are provided. According to one aspect, text is received into a processor of a speech synthesis system. The received text is processed using a sum-of-products phoneme duration model that is used in either the formant method or the concatenative method of speech generation. The phoneme duration model, which is used along with a phoneme pitch model, is produced by developing a non-exponential functional transformation form for use with a generalized additive model. The non-exponential functional transformation form comprises a root sinusoidal transformation that is controlled in response to a minimum phoneme duration and a maximum phoneme duration. The minimum and maximum phoneme durations are observed in training data. The received text is processed by specifying at least one of a number of contextual factors for the generalized additive model. An inverse of the non-exponential functional transformation is applied to duration observations, or training data. Coefficients are generated for use with the generalized additive model. The generalized additive model comprising the coefficients is applied to at least one phoneme of the received text resulting in the generation of at least one phoneme having a duration. An acoustic sequence is generated comprising speech signals that are representative of the received text.
摘要:
A method and apparatus for speech synthesis in a computer-user interface using random paralinguistic variation is described herein. According to one aspect of the present invention, a method for synthesizing speech comprises generating synthesized speech having certain prosodic features. The synthesized speech is further processed by applying a random paralinguistic variation to the acoustic sequence representing the synthesized speech without altering the linguistic prosodic features. According to one aspect of the present invention, the application of the paralinguistic variation is correlated with a previously applied paralinguistic variation to reflect a gradual change in the computer voice, while still maintaining a random quality.
摘要:
Methods and systems for providing graphical user interfaces are described. Overlaid, information-bearing windows whose contents remain unchanged for a predetermined period of time become translucent. The translucency can be graduated so that, over time, if the window's contents remain unchanged, the window becomes more translucent. In addition to visual translucency, windows also have a manipulative translucent quality. Upon reaching a certain level of visual translucency, user input in the region of the window is interpreted as an operation on the underlying objects rather than the contents of the overlaying window.
摘要:
Methods and systems for providing graphical user interfaces are described. Overlaid, information-bearing windows whose contents remain unchanged for a predetermined period of time become translucent. The translucency can be graduated so that, over time, if the window's contents remain unchanged, the window becomes more translucent. In addition to visual translucency, windows also have a manipulative translucent quality. Upon reaching a certain level of visual translucency, user input in the region of the window is interpreted as an operation on the underlying objects rather than the contents of the overlaying window.