摘要:
A method, apparatus and a computer program product to generate an audible speech word that corresponds to text. The method includes providing a text word and, in response to the text word, processing pre-recorded speech segments that are derived from a plurality of speakers to selectively concatenate together speech segments based on at least one cost function to form audio data for generating an audible speech word that corresponds to the text word. A data structure is also provided for use in a concatenative text-to-speech system that includes a plurality of speech segments derived from a plurality of speakers, where each speech segment includes an associated attribute vector each of which is comprised of at least one attribute vector element that identifies the speaker from which the speech segment was derived.
摘要:
Examples of paralinguistic events (e.g., breaths, coughs, sighs, etc.) are recorded. A text-to-speech (“TTS”) engine may insert the examples into a stream of synthetic speech using, for example, markup. The synthetic speech may include a combination of normal text and paralinguistic text.
摘要:
Systems and methods are provided for expressive text-to-speech which include identifying text to convert to speech, selecting a speech style sheet from a set of available speech style sheets, the speech style sheet defining desired speech characteristics, marking the text to associate the text with the selected speech style sheet, and converting the text to speech having the desired speech characteristics by applying a low level markup associated with the speech style sheet.
摘要:
Disclosed is a system and method for improving the intelligibility of speech output by a speech synthesizer by determining if uncommon words exist in the text, and if it is determined that an uncommon word exists in the text, pausing the output of the synthesized speech of the uncommon word to offset the uncommon word from its surrounding speech.
摘要:
A system for regulating the volume and frequency content of audio producing devices, the system includes: one or more noise making objects (NMO) configured with individual sound control devices in electrical communication with a noise management server and one or more audio producing devices configured with individual sound control devices; wherein the sound control devices have electronic logic processing, storage, and communication capabilities; wherein the noise management server utilize the sound control devices to: determine whether the NMO are producing noise in the audible range of one or more audio producing devices; determine a noise characteristic of the one or more NMO; command the one or more NMO to send the noise characteristic to the one or more audio producing devices; and wherein the volume and frequency content of audio produced by the one or more audio producing devices is adjusted in response to the received noise characteristic.
摘要:
A method for improving the performance of a noise cancellation device, the method includes determining whether one or more noise making objects (NMO) are near an audible range of the noise cancellation device and receiving a signal from the one or more NMOs indicative of a kind of noise the one or more NMOs is generating. The method also includes selecting a specific noise cancellation model to reduce an expected noise in response to the received kind of noise the one or more NMOs is generating.
摘要:
A method (and system) which autonomously generates a cohesive script from a text database for creating a speech corpus for concatenative text-to-speech, and more particularly, which generates cohesive scripts having fluency and natural prosody that can be used to generate compact text-to-speech recordings that cover a plurality of phonetic events.
摘要:
A method for improving the performance of a noise cancellation device, the method includes determining whether one or more noise making objects (NMO) are near an audible range of the noise cancellation device and receiving a signal from the one or more NMOs indicative of a kind of noise the one or more NMOs is generating. The method also includes selecting a specific noise cancellation model to reduce an expected noise in response to the received kind of noise the one or more NMOs is generating.
摘要:
A system for regulating the volume and frequency content of audio producing devices, the system includes: one or more noise making objects (NMO) configured with individual sound control devices in electrical communication with a noise management server and one or more audio producing devices configured with individual sound control devices; wherein the sound control devices have electronic logic processing, storage, and communication capabilities; wherein the noise management server utilize the sound control devices to: determine whether the NMO are producing noise in the audible range of one or more audio producing devices; determine a noise characteristic of the one or more NMO; command the one or more NMO to send the noise characteristic to the one or more audio producing devices; and wherein the volume and frequency content of audio produced by the one or more audio producing devices is adjusted in response to the received noise characteristic.
摘要:
A testing arrangement provided for speech recognition systems in vehicles. Preferably included are a “mobile client” secured in the vehicle and driven around at a desired speed, an audio system and speaker which plays back a set of prerecorded utterances stored digitally in a computer arrangement such that the speech of a human being is simulated, transmission of the speech signal to a server, followed by speech recognition and signal-to-noise ratio (SNR) computation. Here, the acceptability of the vehicular speech recognition system is preferably determined via comparison with pre-specified standards of recognition accuracy and SNR values.