Patent search ap:("Amazon Technologies Page Inc.") AND inv:"Mayank Sharma"

1.

发明授权
Language agnostic drift correction 有权

公开(公告)号：US11625928B1

公开(公告)日：2023-04-11

申请号：US17009311

申请日：2020-09-01

Applicant: Amazon Technologies, Inc.

Inventor： Tamojit Chatterjee , Mayank Sharma , Muhammad Raffay Hamid , Sandeep Joshi

IPC: G06F17/00 , G06V20/62 , G11B27/10 , G06N7/00 , G06F40/169 , G06V20/40

Abstract: Systems, methods, and computer-readable media are disclosed for language-agnostic subtitle drift detection and correction. A method may include determining subtitles and/or captions from media content (e.g., videos), the subtitles and/or captions corresponding to dialog in the media content. The subtitles may be broken up into segments which may be analyzed to determine a likelihood of drift (e.g., a likelihood that the subtitles are out of synchronization with the dialog in the media content) for each segment. For segments with a high likelihood of drift, the subtitles may be incrementally adjusted to determine an adjustment that eliminates and/or reduces the amount of drift and the drift in the segment may be corrected based on the drift amount detected. A linear regression model and/or human blocks determined by human operators may be used to otherwise optimize drift correction.

2.

发明授权
Multi-task and multi-lingual emotion mismatch detection for automated dubbing 有权

公开(公告)号：US12205614B1

公开(公告)日：2025-01-21

申请号：US17661165

申请日：2022-04-28

Applicant: Amazon Technologies, Inc.

Inventor： Mayank Sharma , Anil Kumar Nelakanti , Palanivelu Balakrishnan , Saravanan Santhamoorthy Theckyam , Honey Gupta

IPC: G10L25/63 , G06F16/23 , G06F40/44 , G06V20/40 , G10L13/033 , G10L15/04 , G10L25/57

Abstract: Methods and apparatus are described for evaluating dubbing of media content. Emotions are identified based on combinations of attributes determined for segments of a source language audio and a dubbed audio. The emotions may be compared to determine emotional prosody transfer between the source audio and dubbed audio. Based on the comparison, a notification is generated indicating whether an emotion classification associated with the source audio matches an emotion classification associated with the dubbed audio.

3.

发明授权
Automated preview generation for video entertainment content 有权

公开(公告)号：US11910073B1

公开(公告)日：2024-02-20

申请号：US17819918

申请日：2022-08-15

Applicant: Amazon Technologies, Inc.

Inventor： Mayank Sharma , Prabhakar Gupta , Honey Gupta , Kumar Keshav

IPC: H04N21/8549 , H04N21/466 , H04N21/472

CPC classification number: H04N21/8549 , H04N21/466 , H04N21/47217

Abstract: A respective set of features, including emotion-related features, are extracted from segments of a video for which a preview is to be generated. A subset of the segments is chosen using the features and filtering criteria including at least one emotion-based filtering criterion. Respective weighted preview-suitability scores are assigned to the segments of the subset using at least a metric of similarity between individual segments and a plot summary of the video. The scores are used to select and combine segments to form a preview for the video.

4.

发明授权
Language agnostic drift correction 有权

公开(公告)号：US11900700B2

公开(公告)日：2024-02-13

申请号：US18175044

申请日：2023-02-27

Applicant: Amazon Technologies, Inc.

Inventor： Tamojit Chatterjee , Mayank Sharma , Muhammad Raffay Hamid , Sandeep Joshi

IPC: G06F17/00 , G06V20/62 , G11B27/10 , G06F40/169 , G06V20/40 , G06N7/01

CPC classification number: G06V20/635 , G06F40/169 , G06N7/01 , G06V20/40 , G11B27/10 , G06V20/44

Abstract: Systems, methods, and computer-readable media are disclosed for language-agnostic subtitle drift detection and correction. A method may include determining subtitles and/or captions from media content (e.g., videos), the subtitles and/or captions corresponding to dialog in the media content. The subtitles may be broken up into segments which may be analyzed to determine a likelihood of drift (e.g., a likelihood that the subtitles are out of synchronization with the dialog in the media content) for each segment. For segments with a high likelihood of drift, the subtitles may be incrementally adjusted to determine an adjustment that eliminates and/or reduces the amount of drift, and the drift in the segment may be corrected based on the drift amount detected. A linear regression model and/or human blocks determined by human operators may be used to otherwise optimize drift correction.

5.

发明申请
SCENE BASED AUDIO MIXING FOR GENERATING AUDIO DESCRIPTION CONTENT 有权

公开(公告)号：US20250142139A1

公开(公告)日：2025-05-01

申请号：US18536053

申请日：2023-12-11

Applicant: Amazon Technologies, Inc.

Inventor： Akash Amol , Ankit Prem Manocha , Mayank Sharma , Ashutosh Singhal , Jayashree Rajagopalan , Ayotomiwa Ajewole

IPC: H04N21/233 , H04N21/81

Abstract: The present disclosure generally relates to systems and methods for generating an AD content. In some implementation examples, an AD content system obtains and input audio and an AD narration, and normalizes a loudness of a section of the AD narration using a loudness of the input audio during a scene that the section corresponds to for generating a normalized section. Based on a loudness of the normalized section, the AD content system compresses a first audio channel of the input audio during the scene to generate a first compressed audio channel, and mix the normalized section to the first compressed audio channel during the scene to generate a first sound channel of the AD content.

6.

发明授权
Language agnostic missing subtitle detection 有权

公开(公告)号：US11538461B1

公开(公告)日：2022-12-27

申请号：US17249930

申请日：2021-03-18

Applicant: Amazon Technologies, Inc.

Inventor： Honey Gupta , Mayank Sharma

IPC: G10L15/08 , G10L15/16 , H04N21/488 , G10L25/93

Abstract: Some implementations include methods for detecting missing subtitles associated with a media presentation and may include receiving an audio component and a subtitle component associated with a media presentation, the audio component including an audio sequence, the audio sequence divided into a plurality of audio segments; evaluating the plurality of audio segments using a combination of a recurrent neural network and a convolutional neural network to identify refined speech segments associated with the audio sequence, the recurrent neural network trained based on a plurality of languages, the convolutional neural network trained based on a plurality of categories of sound; determining timestamps associated with the identified refined speech segments; and determining missing subtitles based on the timestamps associated with the identified refined speech segments and timestamps associated with subtitles included in the subtitle component.

7.

发明授权
Language agnostic automated voice activity detection 有权

公开(公告)号：US11205445B1

公开(公告)日：2021-12-21

申请号：US16436351

申请日：2019-06-10

Applicant: Amazon Technologies, Inc.

Inventor： Mayank Sharma , Sandeep Joshi , Muhammad Raffay Hamid

IPC: G10L25/84 , G10L15/22 , G10L25/18 , G10L15/06 , G10L15/16

Abstract: Systems, methods, and computer-readable media are disclosed for systems and methods for language agnostic automated voice activity detection. Example methods may include determining an audio file associated with video content, generating a number of audio segments using the audio file, the plurality of audio segments including a first segment and a second segment, where the first segment and the second segment are consecutive segments. Example methods may include determining, using a Gated Recurrent Unit neural network, that the first segment includes first voice activity, determining, using the Gated Recurrent Unit neural network, that the second segment includes second voice activity, and determining that voice activity is present between a first timestamp associated with the first segment and a second timestamp associated with the second segment.

8.

发明公开
AUTOMATED PREVIEW GENERATION FOR VIDEO ENTERTAINMENT CONTENT 审中-公开

公开(公告)号：US20240223872A1

公开(公告)日：2024-07-04

申请号：US18411720

申请日：2024-01-12

Applicant: Amazon Technologies, Inc.

Inventor： Mayank Sharma , Prabhakar Gupta , Honey Gupta , Kumar Keshav

IPC: H04N21/8549 , H04N21/466 , H04N21/472

CPC classification number: H04N21/8549 , H04N21/466 , H04N21/47217

Abstract: A respective set of features, including emotion-related features, are extracted from segments of a video for which a preview is to be generated. A subset of the segments is chosen using the features and filtering criteria including at least one emotion-based filtering criterion. Respective weighted preview-suitability scores are assigned to the segments of the subset using at least a metric of similarity between individual segments and a plot summary of the video. The scores are used to select and combine segments to form a preview for the video.

9.

发明授权
Language agnostic automated voice activity detection 有权

公开(公告)号：US11869537B1

公开(公告)日：2024-01-09

申请号：US17523777

申请日：2021-11-10

Applicant: Amazon Technologies, Inc.

Inventor： Mayank Sharma , Sandeep Joshi , Muhammad Raffay Hamid

IPC: G10L25/84 , G10L15/06 , G10L15/22 , G10L15/16 , G10L25/18

CPC classification number: G10L25/84 , G10L15/063 , G10L15/16 , G10L15/22 , G10L25/18

Abstract: Systems, methods, and computer-readable media are disclosed for systems and methods for language agnostic automated voice activity detection. Example methods may include determining an audio file associated with video content, generating audio segments using the audio file, the audio segments including a first segment and a second segment, and determining that the first segment includes first voice activity. Methods may include determining that the second segment comprises second voice activity, determining that voice activity is present between a first timestamp associated with the first segment and a second timestamp associated with the second segment, and generating text data representing the voice activity that is present between the first timestamp and the second timestamp.

10.

发明授权
Media classification using local and global audio features 有权

公开(公告)号：US11617008B1

公开(公告)日：2023-03-28

申请号：US17218009

申请日：2021-03-30

Applicant: Amazon Technologies, Inc.

Inventor： Tarun Gupta , Mayank Sharma , Xiang Hao , Muhammad Raffay Hamid , Zhitao Qiu

IPC: H04N21/439 , G06N20/00 , H04N21/466 , H04N21/475

Abstract: Methods, systems, and computer-readable media for media classification using local and global audio features are disclosed. A media classification system determines local features of an audio input using an audio event detector model that is trained to detect a plurality of audio event classes descriptive of objectionable content. The local features are extracted using maximum values of audio event scores for individual audio event classes. The media classification system determines one or more global features of the audio input using the audio event detector model. The global feature(s) are extracted using averaging of clip-level descriptors of a plurality of clips of the audio input. The media classification system determines a content-based rating for media comprising the audio input based (at least in part) on the local features of the audio input and based (at least in part) on the global feature(s) of the audio input.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification