VIDEO GENERATION USING FRAME-WISE TOKEN EMBEDDINGS

    公开(公告)号:US20250119624A1

    公开(公告)日:2025-04-10

    申请号:US18894443

    申请日:2024-09-24

    Applicant: ADOBE INC.

    Abstract: A method, apparatus, non-transitory computer readable medium, and system for generating synthetic videos includes obtaining an input prompt describing a video scene. The embodiments then generate a plurality of frame-wise token embeddings corresponding to a sequence of video frames, respectively, based on the input prompt. Subsequently, embodiments generate, using a video generation model, a synthesized video depicting the video scene. The synthesized includes a plurality of images corresponding to the sequence of video frames.

    STYLE-AWARE AUDIO-DRIVEN TALKING HEAD ANIMATION FROM A SINGLE IMAGE

    公开(公告)号:US20220392131A1

    公开(公告)日:2022-12-08

    申请号:US17887685

    申请日:2022-08-15

    Applicant: Adobe Inc.

    Abstract: Embodiments of the present invention provide systems, methods, and computer storage media for generating an animation of a talking head from an input audio signal of speech and a representation (such as a static image) of a head to animate. Generally, a neural network can learn to predict a set of 3D facial landmarks that can be used to drive the animation. In some embodiments, the neural network can learn to detect different speaking styles in the input speech and account for the different speaking styles when predicting the 3D facial landmarks. Generally, template 3D facial landmarks can be identified or extracted from the input image or other representation of the head, and the template 3D facial landmarks can be used with successive windows of audio from the input speech to predict 3D facial landmarks and generate a corresponding animation with plausible 3D effects.

    Utilizing generative models for resynthesis of transition frames in clipped digital videos

    公开(公告)号:US12192593B2

    公开(公告)日:2025-01-07

    申请号:US18164348

    申请日:2023-02-03

    Applicant: Adobe Inc.

    Abstract: The present disclosure relates to systems, methods, and non-transitory computer-readable media that utilize machine learning to generate a sequence of transition frames for a gap in a clipped digital video. For example, the disclosed system receives a clipped digital video that includes a pre-cut frame prior to a gap in the clipped digital video and a post-cut frame following the gap in the clipped digital video. Moreover, the disclosed system utilizes a natural motion sequence model to generates a sequence of transition keypoint maps between the pre-cut frame and the post-cut frame. Furthermore, using a generative neural network, the disclosed system generates a sequence of transition frames for the gap in the clipped digital video from the sequence of transition keypoint maps.

    Style-aware audio-driven talking head animation from a single image

    公开(公告)号:US11417041B2

    公开(公告)日:2022-08-16

    申请号:US16788551

    申请日:2020-02-12

    Applicant: ADOBE INC.

    Abstract: Embodiments of the present invention provide systems, methods, and computer storage media for generating an animation of a talking head from an input audio signal of speech and a representation (such as a static image) of a head to animate. Generally, a neural network can learn to predict a set of 3D facial landmarks that can be used to drive the animation. In some embodiments, the neural network can learn to detect different speaking styles in the input speech and account for the different speaking styles when predicting the 3D facial landmarks. Generally, template 3D facial landmarks can be identified or extracted from the input image or other representation of the head, and the template 3D facial landmarks can be used with successive windows of audio from the input speech to predict 3D facial landmarks and generate a corresponding animation with plausible 3D effects.

    Style-aware audio-driven talking head animation from a single image

    公开(公告)号:US11776188B2

    公开(公告)日:2023-10-03

    申请号:US17887685

    申请日:2022-08-15

    Applicant: Adobe Inc.

    CPC classification number: G06T13/205 G06T13/40 G06T17/20

    Abstract: Embodiments of the present invention provide systems, methods, and computer storage media for generating an animation of a talking head from an input audio signal of speech and a representation (such as a static image) of a head to animate. Generally, a neural network can learn to predict a set of 3D facial landmarks that can be used to drive the animation. In some embodiments, the neural network can learn to detect different speaking styles in the input speech and account for the different speaking styles when predicting the 3D facial landmarks. Generally, template 3D facial landmarks can be identified or extracted from the input image or other representation of the head, and the template 3D facial landmarks can be used with successive windows of audio from the input speech to predict 3D facial landmarks and generate a corresponding animation with plausible 3D effects.

    GENERATING THREE-DIMENSIONAL HUMAN MODELS REPRESENTING TWO-DIMENSIONAL HUMANS IN TWO-DIMENSIONAL IMAGES

    公开(公告)号:US20240144520A1

    公开(公告)日:2024-05-02

    申请号:US18304144

    申请日:2023-04-20

    Applicant: Adobe Inc.

    CPC classification number: G06T7/73 G06T2207/20084 G06T2207/30196

    Abstract: The present disclosure relates to systems, methods, and non-transitory computer-readable media that modify two-dimensional images via scene-based editing using three-dimensional representations of the two-dimensional images. For instance, in one or more embodiments, the disclosed systems utilize three-dimensional representations of two-dimensional images to generate and modify shadows in the two-dimensional images according to various shadow maps. Additionally, the disclosed systems utilize three-dimensional representations of two-dimensional images to modify humans in the two-dimensional images. The disclosed systems also utilize three-dimensional representations of two-dimensional images to provide scene scale estimation via scale fields of the two-dimensional images. In some embodiments, the disclosed systems utilizes three-dimensional representations of two-dimensional images to generate and visualize 3D planar surfaces for modifying objects in two-dimensional images. The disclosed systems further use three-dimensional representations of two-dimensional images to customize focal points for the two-dimensional images.

Patent Agency Ranking