Method and apparatus with text-to-speech conversion
Abstract:
A processor-implemented text-to-speech method includes determining, using a sub-encoder, a first feature vector indicating an utterance characteristic of a speaker from feature vectors of a plurality of frames extracted from a partial section of a first speech signal of the speaker, and determining, using an autoregressive decoder, into which the first feature vector is input as an initial value, from context information of the text, a second feature vector of a second speech signal in which a text is uttered according to the utterance characteristic.
Public/Granted literature
Information query
Patent Agency Ranking
0/0