Abstract:
According to an exemplary embodiment of a guided speaker adaptive speech synthesis system, a speaker adaptive training module generates adaptation information and a speaker-adapted model based on inputted recording text and recording speech. A text to speech engine receives the recording text and the speaker-adapted model and outputs synthesized speech information. A performance assessment module receives the adaptation information and the synthesized speech information to generate assessment information. An adaptation recommendation module selects at least one subsequent recording text from at least one text source as a recommendation of a next adaption process, according to the adaptation information and the assessment information.