METHOD AND DEVICE FOR SYNTHESIZING TALKING HEAD VIDEO AND COMPUTER-READABLE STORAGE MEDIUM

    公开(公告)号:US20240428493A1

    公开(公告)日:2024-12-26

    申请号:US18736552

    申请日:2024-06-07

    Abstract: A method for synthesizing a talking head video includes: obtaining speech data to be synthesized and observation data, wherein the observation data is data obtained through observation other than the speech data; performing feature extraction on the speech data to obtain speech features corresponding to the speech data, and performing feature extraction on the observation data to obtain non-speech features corresponding to the observation data; performing temporal modeling on the speech features and first non-speech features to obtain low-dimensional representations, wherein the first non-speech features are non-speech features that are sensitive to temporal changes; and performing video synthesis based on the low-dimensional representations and second non-speech features, wherein the second non-speech features are non-speech features insensitive to temporal changes.

    METHOD FOR GENERATING TALKING HEAD VIDEO, DEVICE AND COMPUTER-READABLE STORAGE MEDIUM

    公开(公告)号:US20230386116A1

    公开(公告)日:2023-11-30

    申请号:US18202291

    申请日:2023-05-26

    CPC classification number: G06T13/40 G06T13/205 G06V40/20 G10L13/02

    Abstract: A method for generating a talking head video includes: obtaining a text and an image containing a face of a user; determining a phoneme sequence that corresponds to the text and includes one or more phonemes; determining acoustic features corresponding to the text according to the phoneme sequence, and obtaining synthesized speech corresponding to the text according to the acoustic features; determining a first mouth movement sequence corresponding to the text according to the phoneme sequence, and determining a second mouth movement sequence corresponding to the text according to the acoustic features; creating a facial action video corresponding to the user according to the first mouth movement sequence, the second mouth movement sequence and the image; and processing the synthesized speech and the facial action video synchronously to obtain a talking head video corresponding to the user.

Patent Agency Ranking