摘要:
Improved automated synthesis of human audible speech from text is disclosed. Performance enhancement of the underlying text comprehensibility is obtained through prosodic treatment of the synthesized material, improved speaking rate treatment, and improved methods of spelling words or terms for the sysstem user. Prosodic shaping of text sequences appropriate for the discourse in large groupings of text segments, with prosodic boundaries developed to indicate conceptual units within the text groupings, is implemented in a preferred embodiment.
摘要:
Improved automated synthesis of human audible speech from text is disclosed. Performance enhancement of the underlying text comprehensibility is obtained through prosodic treatment of the synthesized material, improved speaking rate treatment, and improved methods of spelling words or terms for the sysstem user. Prosodic shaping of text sequences appropriate for the discourse in large groupings of text segments, with prosodic boundaries developed to indicate conceptual units within the text groupings, is implemented in a preferred embodiment.
摘要:
Various dynamic audio ducking techniques are provided that may be applied where multiple audio streams, such as a primary audio stream and a secondary audio stream, are being played back simultaneously. For example, a secondary audio stream may include a voice announcement of one or more pieces of information pertaining to the primary audio stream, such as the name of the track or the name of the artist. In one embodiment, the primary audio data and the voice feedback data are initially analyzed to determine a loudness value. Based on their respective loudness values, the primary audio stream may be ducked during the period of simultaneous playback such that a relative loudness difference is generally maintained with respect to the loudness of the primary and secondary audio streams. Accordingly, the amount of ducking applied may be customized for each piece of audio data depending on its loudness characteristics.
摘要:
Improved techniques for providing supplementary media for media items are disclosed. The media items are typically fixed media items. The supplementary media is one or more of audio, video, image, or text that is provided by a user to supplement (e.g., personalize, customize, annotate, etc.) the fixed media items. In one embodiment, the supplementary media can be provided by user interaction with an on-line media store where media items can be browsed, searched, purchased and/or acquired via a computer network. In another embodiment, the supplementary media can be generated on a playback device.
摘要:
Improved automated synthesis of human audible speech from text is disclosed. Performance enhancement of the underlying text comprehensibility is obtained through prosodic treatment of the synthesized material, improved speaking rate treatment, and improved methods of spelling words or terms for the system user. Prosodic shaping of text sequences appropriate for the discourse in large groupings of text segments, with prosodic boundaries developed to indicate conceptual units within the text groupings, is implemented in a preferred embodiment.
摘要:
Improved automated synthesis of human audible speech from text is disclosed. Performance enhancement of the underlying text comprehensibility is obtained through prosodic treatment of the synthesized material, improved speaking rate treatment, and improved methods of spelling words or terms for the system user. Prosodic shaping of text sequences appropriate for the discourse in large groupings of text segments, with prosodic boundaries developed to indicate conceptual units within the text groupings, is implemented in a preferred embodiment.
摘要:
Various dynamic audio ducking techniques are provided that may be applied where multiple audio streams, such as a primary audio stream and a secondary audio stream, are being played back simultaneously. For example, a secondary audio stream may include a voice announcement of one or more pieces of information pertaining to the primary audio stream, such as the name of the track or the name of the artist. In one embodiment, the primary audio data and the voice feedback data are initially analyzed to determine a loudness value. Based on their respective loudness values, the primary audio stream may be ducked during the period of simultaneous playback such that a relative loudness difference is generally maintained with respect to the loudness of the primary and secondary audio streams. Accordingly, the amount of ducking applied may be customized for each piece of audio data depending on its loudness characteristics.
摘要:
Improved automated synthesis of human audible speech from text is disclosed. Performance enhancement of the underlying text comprehensibility is obtained through prosodic treatment of the synthesized material, improved speaking rate treatment, and improved methods of spelling words or terms for the sysstem user. Prosodic shaping of text sequences appropriate for the discourse in large groupings of text segments, with prosodic boundaries developed to indicate conceptual units within the text groupings, is implemented in a preferred embodiment.
摘要:
Improved automated synthesis of human audible speech from text is disclosed. Performance enhancement of the underlying text comprehensibility is obtained through prosodic treatment of the synthesized material, improved speaking rate treatment, and improved methods of spelling words or terms for the system user. Prosodic shaping of text sequences appropriate for the discourse in large groupings of text segments, with prosodic boundaries developed to indicate conceptual units within the text groupings, is implemented in a preferred embodiment.