摘要:
A method, a system, and an apparatus for identifying and correcting sources of problems in synthesized speech which is generated using a concatenative text-to-speech (CTTS) technique. The method can include the step of displaying a waveform corresponding to synthesized speech generated from concatenated phonetic units. The synthesized speech can be generated from text input received from a user. The method further can include the step of displaying parameters corresponding to at least one of the phonetic units. The method can include the step of displaying the original recordings containing selected phonetic units. An editing input can be received from the user and the parameters can be adjusted in accordance with the editing input.
摘要:
A method, a system, and an apparatus for identifying and correcting sources of problems in synthesized speech which is generated using a concatenative text-to-speech (CTTS) technique. The method can include the step of displaying a waveform corresponding to synthesized speech generated from concatenated phonetic units. The synthesized speech can be generated from text input received from a user. The method further can include the step of displaying parameters corresponding to at least one of the phonetic units. The method can include the step of displaying the original recordings containing selected phonetic units. An editing input can be received from the user and the parameters can be adjusted in accordance with the editing input.
摘要:
A method, a system, and an apparatus for identifying and correcting sources of problems in synthesized speech which is generated using a concatenative text-to-speech (CTTS) technique. The method can include the step of displaying a waveform corresponding to synthesized speech generated from concatenated phonetic units. The synthesized speech can be generated from text input received from a user. The method further can include the step of displaying parameters corresponding to at least one of the phonetic units. The method can include the step of displaying the original recordings containing selected phonetic units. An editing input can be received from the user and the parameters can be adjusted in accordance with the editing input.
摘要:
A method of filtering phonetic units to be used within a concatenative text-to-speech (CTTS) voice. Initially, a normality threshold can be established. At least one phonetic unit that has been automatically extracted from a speech corpus in order to construct the CTTS voice can be received. An abnormality index can be calculated for the phonetic unit. Then, the abnormality index can be compared to the established normality threshold. If the abnormality index exceeds the normality threshold, the phonetic unit can be marked as a suspect phonetic unit. If the abnormality index does not exceed the normality threshold, the phonetic unit can be marked as a verified phonetic unit. The concatenative text-to-speech voice can be built using the verified phonetic units.
摘要:
A method, a system, and an apparatus for identifying and correcting sources of problems in synthesized speech which is generated using a concatenative text-to-speech (CTTS) technique. The method can include the step of displaying a waveform corresponding to synthesized speech generated from concatenated phonetic units. The synthesized speech can be generated from text input received from a user. The method further can include the step of displaying parameters corresponding to at least one of the phonetic units. The method can include the step of displaying the original recordings containing selected phonetic units. An editing input can be received from the user and the parameters can be adjusted in accordance with the editing input.
摘要:
A computerized method (300) and software product (200) is provided for querying and modifying a Multi-Level Data Structure (106) stored in a Text-to-Speech (100) engine of a data processing system having a Central Processing Unit (202), a processing system memory (203), and an operating system (201), using an application program written in an interpretive programming language. The method includes the steps of initializing (302) by means of the CPU implementing a set of commands, a data processing environment for processing the application program, processing (306) the application program, where the processing includes identifying a marked command that encapsulates a DPMS program, and upon identifying a marked command, operating (318) on the MLDS using a DPMS interpreter for producing a result from the MLDS, the result available to the application program during execution of the application program.
摘要:
A device and related methods for word-sense disambiguation during a text-to-speech conversion are provided. The device, for use with a computer-based system capable of converting text data to synthesized speech, includes an identification module for identifying a homograph contained in the text data. The device also includes an assignment module for assigning a pronunciation to the homograph using a statistical test constructed from a recursive partitioning of training samples, each training sample being a word string containing the homograph. The recursive partitioning is based on determining for each training sample an order and a distance of each word indicator relative to the homograph in the training sample. An absence of one of the word indicators in a training sample is treated as equivalent to the absent word indicator being more than a predefined distance from the homograph.
摘要:
A method of filtering phonetic units to be used within a concatenative text-to-speech (CTTS) voice. Initially, a normality threshold can be established. At least one phonetic unit that has been automatically extracted from a speech corpus in order to construct the CTTS voice can be received. An abnormality index can be calculated for the phonetic unit. Then, the abnormality index can be compared to the established normality threshold. If the abnormality index exceeds the normality threshold, the phonetic unit can be marked as a suspect phonetic unit. If the abnormality index does not exceed the normality threshold, the phonetic unit can be marked as a verified phonetic unit. The concatenative text-to-speech voice can be built using the verified phonetic units.
摘要:
A computerized method (300) and software product (200) is provided for querying and modifying a Multi-Level Data Structure (106) stored in a Text-to-Speech (100) engine of a data processing system having a Central Processing Unit (202), a processing system memory (203), and an operating system (201), using an application program written in an interpretive programming language. The method includes the steps of initializing (302) by means of the CPU implementing a set of commands, a data processing environment for processing the application program, processing (306) the application program, where the processing includes identifying a marked command that encapsulates a DPMS program, and upon identifying a marked command, operating (318) on the MLDS using a DPMS interpreter for producing a result from the MLDS, the result available to the application program during execution of the application program.
摘要:
A device and related methods for word-sense disambiguation during a text-to-speech conversion are provided. The device, for use with a computer-based system capable of converting text data to synthesized speech, includes an identification module for identifying a homograph contained in the text data. The device also includes an assignment module for assigning a pronunciation to the homograph using a statistical test constructed from a recursive partitioning of training samples, each training sample being a word string containing the homograph. The recursive partitioning is based on determining for each training sample an order and a distance of each word indicator relative to the homograph in the training sample. An absence of one of the word indicators in a training sample is treated as equivalent to the absent word indicator being more than a predefined distance from the homograph.