GENERATING DUBBED AUDIO FROM A VIDEO-BASED SOURCE

    公开(公告)号:US20240087557A1

    公开(公告)日:2024-03-14

    申请号:US17931026

    申请日:2022-09-09

    Applicant: GOOGLE LLC

    CPC classification number: G10L13/02 G06F40/58 G10L13/086

    Abstract: The present disclosure relates to generating and adjusting translated audio from a video-based source. The method includes receiving video data and corresponding audio data in a first language; generating a translated preliminary transcript in a second language; aligning timing windows of portions of the translated preliminary transcript with corresponding segments of the audio data; determining portions of the translated aligned transcript in the second language that exceed a timing window range of the corresponding segments of the audio data in the first language to generate flagged transcript portions; transmitting the original transcript, the translated aligned transcript, and the first speech dub to a first device, the generated flagged transcript portions included in the original transcript and the translated aligned transcript; receiving, from the first device, a modified original transcript; and generating, based on the modified original transcript, a second speech dub in the second language.

    Generating dubbed audio from a video-based source

    公开(公告)号:US12236935B2

    公开(公告)日:2025-02-25

    申请号:US17931026

    申请日:2022-09-09

    Applicant: GOOGLE LLC

    Abstract: The present disclosure relates to generating and adjusting translated audio from a video-based source. The method includes receiving video data and corresponding audio data in a first language; generating a translated preliminary transcript in a second language; aligning timing windows of portions of the translated preliminary transcript with corresponding segments of the audio data; determining portions of the translated aligned transcript in the second language that exceed a timing window range of the corresponding segments of the audio data in the first language to generate flagged transcript portions; transmitting the original transcript, the translated aligned transcript, and the first speech dub to a first device, the generated flagged transcript portions included in the original transcript and the translated aligned transcript; receiving, from the first device, a modified original transcript; and generating, based on the modified original transcript, a second speech dub in the second language.

Patent Agency Ranking