摘要:
A method is disclosed to generate a speech output from a text input written in a first language and containing inclusions in a second language. The speech output generated by the disclosed method is characterized by a consistent, unique speaker identity. Words in the native language are pronounced with a native pronunciation and words in the foreign language are pronounced with a proficient foreign pronunciation. Language dependent phoneme symbols generated for words of the second language are replaced with language dependent phoneme symbols of the first language, where said replacing includes the steps of assigning to each language dependent phoneme symbol of the second language a language independent target phoneme symbol, mapping to each one language independent target phoneme symbol a language independent substitute phoneme symbol assignable to a language dependent substitute phoneme symbol of the first language, substituting the language dependent phoneme symbols of the second language by the language dependent substitute phoneme symbols of the first language. This results in a target unit sequence of phoneme symbols of the first language. From a waveform unit database of the first language a waveform unit sequence approximating the at least one target unit sequence is selected and concatenated to the speech waveform.