Audio-video synchronization for non-original audio tracks

    公开(公告)号:US11610610B1

    公开(公告)日:2023-03-21

    申请号:US17643805

    申请日:2021-12-10

    Abstract: Systems and methods are provided for detecting and correcting synchronization errors in multimedia content comprising a video stream and a non-original audio stream. Techniques for directly detecting synchronization of video and audio streams may be inadequate to detect synchronize errors for non-original audio streams, particularly where such non-original audio streams contain audio not reflective of events within the video stream, such as speaking dialog in a different language than the speakers of the video stream. To overcome this problem, the present disclosure enables synchronization of a non-original audio stream to another audio stream, such as an original audio stream, that is synchronized to the video stream. By comparison of signatures, the non-original and other audio stream are aligned to determine an offset that can be used to synchronize the non-original audio stream to the video stream.

    Person replacement utilizing deferred neural rendering

    公开(公告)号:US11582519B1

    公开(公告)日:2023-02-14

    申请号:US17215475

    申请日:2021-03-29

    Abstract: Techniques are disclosed for performing video synthesis of audiovisual content. In an example, a computing system may determine first parameters of a face and body of a source person from a first frame in a video shot. The system also determines second parameters of a face and body of a target person. The system determines that the target person is a replacement for the source person in the first frame. The system generates third parameters of the target person based on merging the first parameters with the second parameters. The system then performs deferred neural rendering of the target person based on a neural texture that corresponds to a texture space of the video shot. The system then outputs a second frame that shows the target person as the replacement for the source person.

    Customized action based on video item events

    公开(公告)号:US11093781B2

    公开(公告)日:2021-08-17

    申请号:US16208074

    申请日:2018-12-03

    Abstract: A user may indicate an interest relating to events such as objects, persons, or activities, where the events included in content depicted in a video. The user may also indicate a configurable action associated with the user interest, including receiving a notification via an electronic device. A video item, for example a live-streaming sporting event, may be broken into frames and analyzed frame-by-frame to determine a region of interest. The region of interest is then analyzed to identify objects, persons, or activities depicted in the frame. In particular, the region of interest is compared to stored images that are known to depict different objects, persons, or activities. When a region of interest is determined to be associated with the user interest, the configurable action is triggered.

    Audio locale mismatch detection
    8.
    发明授权

    公开(公告)号:US10860648B1

    公开(公告)日:2020-12-08

    申请号:US16129567

    申请日:2018-09-12

    Abstract: Systems, methods, and computer-readable media are disclosed for detecting a mismatch between the spoken language in an audio file and the audio language that is tagged as the spoken language in the audio file metadata. Example methods may include receiving a media file including spoken language metadata. Certain methods include generating an audio sample from the media file. Certain methods include generating a text translation of the audio sample based on the spoken language metadata. Certain methods include determining that the spoken language metadata does not match a spoken language in the audio sample based on the text translation. Certain methods include sending an indication that the spoken language metadata does not match the spoken language.

    Customized video content summary generation and presentation

    公开(公告)号:US10455297B1

    公开(公告)日:2019-10-22

    申请号:US16116618

    申请日:2018-08-29

    Abstract: Systems, methods, and computer-readable media are disclosed for systems and methods for customized video content summary generation. Example methods may include determining a first segment of digital content including a first set of frames, first textual content, and first audio content. Example methods may include determining a first event that occurs in the first set of frames, determining a first theme of the first event, generating first metadata indicative of the first theme, and determining a meaning of a first sentence that occurs in the first textual content. Some methods may include determining a second theme of the first sentence, generating second metadata indicative of the second theme, determining that user preference data associated with an active user profile includes the first theme and the second theme, generating a video summary that includes a portion of the first segment of digital content, and presenting the video summary.

    Facial synchronization utilizing deferred neural rendering

    公开(公告)号:US11581020B1

    公开(公告)日:2023-02-14

    申请号:US17217221

    申请日:2021-03-30

    Abstract: Techniques are disclosed for performing video synthesis of audiovisual content. In an example, a computing system may determine first facial parameters of a face of a particular person from a first frame in a video shot, whereby the video shot shows the particular person speaking a message. The system may determine second facial parameters based on an audio file that corresponds to the message being spoken in a different way from the video shot. The system may generate third facial parameters by merging the first and the second facial parameters. The system may identify a region of the face that is associated with a difference between the first and second facial parameters, render the region of the face based on a neural texture of the video shot, and then output a new frame showing the face of the particular person speaking the message in the different way.

Patent Agency Ranking