-
公开(公告)号:US11625928B1
公开(公告)日:2023-04-11
申请号:US17009311
申请日:2020-09-01
Applicant: Amazon Technologies, Inc.
Inventor: Tamojit Chatterjee , Mayank Sharma , Muhammad Raffay Hamid , Sandeep Joshi
Abstract: Systems, methods, and computer-readable media are disclosed for language-agnostic subtitle drift detection and correction. A method may include determining subtitles and/or captions from media content (e.g., videos), the subtitles and/or captions corresponding to dialog in the media content. The subtitles may be broken up into segments which may be analyzed to determine a likelihood of drift (e.g., a likelihood that the subtitles are out of synchronization with the dialog in the media content) for each segment. For segments with a high likelihood of drift, the subtitles may be incrementally adjusted to determine an adjustment that eliminates and/or reduces the amount of drift and the drift in the segment may be corrected based on the drift amount detected. A linear regression model and/or human blocks determined by human operators may be used to otherwise optimize drift correction.
-
公开(公告)号:US12205614B1
公开(公告)日:2025-01-21
申请号:US17661165
申请日:2022-04-28
Applicant: Amazon Technologies, Inc.
Inventor: Mayank Sharma , Anil Kumar Nelakanti , Palanivelu Balakrishnan , Saravanan Santhamoorthy Theckyam , Honey Gupta
Abstract: Methods and apparatus are described for evaluating dubbing of media content. Emotions are identified based on combinations of attributes determined for segments of a source language audio and a dubbed audio. The emotions may be compared to determine emotional prosody transfer between the source audio and dubbed audio. Based on the comparison, a notification is generated indicating whether an emotion classification associated with the source audio matches an emotion classification associated with the dubbed audio.
-
公开(公告)号:US11910073B1
公开(公告)日:2024-02-20
申请号:US17819918
申请日:2022-08-15
Applicant: Amazon Technologies, Inc.
Inventor: Mayank Sharma , Prabhakar Gupta , Honey Gupta , Kumar Keshav
IPC: H04N21/8549 , H04N21/466 , H04N21/472
CPC classification number: H04N21/8549 , H04N21/466 , H04N21/47217
Abstract: A respective set of features, including emotion-related features, are extracted from segments of a video for which a preview is to be generated. A subset of the segments is chosen using the features and filtering criteria including at least one emotion-based filtering criterion. Respective weighted preview-suitability scores are assigned to the segments of the subset using at least a metric of similarity between individual segments and a plot summary of the video. The scores are used to select and combine segments to form a preview for the video.
-
公开(公告)号:US11900700B2
公开(公告)日:2024-02-13
申请号:US18175044
申请日:2023-02-27
Applicant: Amazon Technologies, Inc.
Inventor: Tamojit Chatterjee , Mayank Sharma , Muhammad Raffay Hamid , Sandeep Joshi
CPC classification number: G06V20/635 , G06F40/169 , G06N7/01 , G06V20/40 , G11B27/10 , G06V20/44
Abstract: Systems, methods, and computer-readable media are disclosed for language-agnostic subtitle drift detection and correction. A method may include determining subtitles and/or captions from media content (e.g., videos), the subtitles and/or captions corresponding to dialog in the media content. The subtitles may be broken up into segments which may be analyzed to determine a likelihood of drift (e.g., a likelihood that the subtitles are out of synchronization with the dialog in the media content) for each segment. For segments with a high likelihood of drift, the subtitles may be incrementally adjusted to determine an adjustment that eliminates and/or reduces the amount of drift, and the drift in the segment may be corrected based on the drift amount detected. A linear regression model and/or human blocks determined by human operators may be used to otherwise optimize drift correction.
-
公开(公告)号:US20250142139A1
公开(公告)日:2025-05-01
申请号:US18536053
申请日:2023-12-11
Applicant: Amazon Technologies, Inc.
Inventor: Akash Amol , Ankit Prem Manocha , Mayank Sharma , Ashutosh Singhal , Jayashree Rajagopalan , Ayotomiwa Ajewole
IPC: H04N21/233 , H04N21/81
Abstract: The present disclosure generally relates to systems and methods for generating an AD content. In some implementation examples, an AD content system obtains and input audio and an AD narration, and normalizes a loudness of a section of the AD narration using a loudness of the input audio during a scene that the section corresponds to for generating a normalized section. Based on a loudness of the normalized section, the AD content system compresses a first audio channel of the input audio during the scene to generate a first compressed audio channel, and mix the normalized section to the first compressed audio channel during the scene to generate a first sound channel of the AD content.
-
公开(公告)号:US11538461B1
公开(公告)日:2022-12-27
申请号:US17249930
申请日:2021-03-18
Applicant: Amazon Technologies, Inc.
Inventor: Honey Gupta , Mayank Sharma
IPC: G10L15/08 , G10L15/16 , H04N21/488 , G10L25/93
Abstract: Some implementations include methods for detecting missing subtitles associated with a media presentation and may include receiving an audio component and a subtitle component associated with a media presentation, the audio component including an audio sequence, the audio sequence divided into a plurality of audio segments; evaluating the plurality of audio segments using a combination of a recurrent neural network and a convolutional neural network to identify refined speech segments associated with the audio sequence, the recurrent neural network trained based on a plurality of languages, the convolutional neural network trained based on a plurality of categories of sound; determining timestamps associated with the identified refined speech segments; and determining missing subtitles based on the timestamps associated with the identified refined speech segments and timestamps associated with subtitles included in the subtitle component.
-
公开(公告)号:US11205445B1
公开(公告)日:2021-12-21
申请号:US16436351
申请日:2019-06-10
Applicant: Amazon Technologies, Inc.
Inventor: Mayank Sharma , Sandeep Joshi , Muhammad Raffay Hamid
Abstract: Systems, methods, and computer-readable media are disclosed for systems and methods for language agnostic automated voice activity detection. Example methods may include determining an audio file associated with video content, generating a number of audio segments using the audio file, the plurality of audio segments including a first segment and a second segment, where the first segment and the second segment are consecutive segments. Example methods may include determining, using a Gated Recurrent Unit neural network, that the first segment includes first voice activity, determining, using the Gated Recurrent Unit neural network, that the second segment includes second voice activity, and determining that voice activity is present between a first timestamp associated with the first segment and a second timestamp associated with the second segment.
-
公开(公告)号:US20240223872A1
公开(公告)日:2024-07-04
申请号:US18411720
申请日:2024-01-12
Applicant: Amazon Technologies, Inc.
Inventor: Mayank Sharma , Prabhakar Gupta , Honey Gupta , Kumar Keshav
IPC: H04N21/8549 , H04N21/466 , H04N21/472
CPC classification number: H04N21/8549 , H04N21/466 , H04N21/47217
Abstract: A respective set of features, including emotion-related features, are extracted from segments of a video for which a preview is to be generated. A subset of the segments is chosen using the features and filtering criteria including at least one emotion-based filtering criterion. Respective weighted preview-suitability scores are assigned to the segments of the subset using at least a metric of similarity between individual segments and a plot summary of the video. The scores are used to select and combine segments to form a preview for the video.
-
公开(公告)号:US11869537B1
公开(公告)日:2024-01-09
申请号:US17523777
申请日:2021-11-10
Applicant: Amazon Technologies, Inc.
Inventor: Mayank Sharma , Sandeep Joshi , Muhammad Raffay Hamid
CPC classification number: G10L25/84 , G10L15/063 , G10L15/16 , G10L15/22 , G10L25/18
Abstract: Systems, methods, and computer-readable media are disclosed for systems and methods for language agnostic automated voice activity detection. Example methods may include determining an audio file associated with video content, generating audio segments using the audio file, the audio segments including a first segment and a second segment, and determining that the first segment includes first voice activity. Methods may include determining that the second segment comprises second voice activity, determining that voice activity is present between a first timestamp associated with the first segment and a second timestamp associated with the second segment, and generating text data representing the voice activity that is present between the first timestamp and the second timestamp.
-
公开(公告)号:US11617008B1
公开(公告)日:2023-03-28
申请号:US17218009
申请日:2021-03-30
Applicant: Amazon Technologies, Inc.
Inventor: Tarun Gupta , Mayank Sharma , Xiang Hao , Muhammad Raffay Hamid , Zhitao Qiu
IPC: H04N21/439 , G06N20/00 , H04N21/466 , H04N21/475
Abstract: Methods, systems, and computer-readable media for media classification using local and global audio features are disclosed. A media classification system determines local features of an audio input using an audio event detector model that is trained to detect a plurality of audio event classes descriptive of objectionable content. The local features are extracted using maximum values of audio event scores for individual audio event classes. The media classification system determines one or more global features of the audio input using the audio event detector model. The global feature(s) are extracted using averaging of clip-level descriptors of a plurality of clips of the audio input. The media classification system determines a content-based rating for media comprising the audio input based (at least in part) on the local features of the audio input and based (at least in part) on the global feature(s) of the audio input.
-
-
-
-
-
-
-
-
-