-
公开(公告)号:US11589116B1
公开(公告)日:2023-02-21
申请号:US17306671
申请日:2021-05-03
Applicant: Amazon Technologies, Inc.
Inventor: Mohamed Kamal Omar , Xiaohang Sun , Ivan Ryndin , Tai-Ching Li , Alexander Ratnikov , Muhammad Raffay Hamid , Ahmed Aly Saad Ahmed , Travis Silvers , Hanxiao Deng
IPC: H04N21/4545 , H04N21/454 , H04N21/45
Abstract: Techniques are disclosed for detecting a type of prurient activity shown by a portion of video content. In an example, a machine learning model of a computer system may receive a second portion of video content, the machine learning model including a neural network that is trained to analyze a temporal dimension of the second portion. The machine learning model determines a score indicating a likelihood that the video content shows the type of prurient activity based in part on applying a three-dimensional filter to the second portion of the video content. The computer system then generates a video clip that includes at least the portion of the video content showing the type of prurient activity based on the score, and provides the video clip for display.
-
公开(公告)号:US11532111B1
公开(公告)日:2022-12-20
申请号:US17344690
申请日:2021-06-10
Applicant: Amazon Technologies, Inc.
Inventor: Dongqing Zhang , Muhammad Raffay Hamid , Xiaohan Nie , Shixing Chen
IPC: G06F17/00 , G06T11/60 , G11B27/031 , G10L15/26 , G06F40/134 , G06V20/40 , G06V40/16
Abstract: Techniques for a comic book feature are described herein. A visual data stream of a video may be parsed into a plurality of frames. Scene boundaries may be determined to generate a scene using the plurality of frames where a scene includes a subset of frames. A key frame may be determined for the scene using the subset of frames. An audio portion of an audio data stream of the video may be identified that maps to the subset of frames based on time information. The key frame may be converted to a comic image based on an algorithm. First dimensions and placement for a data object may be determined for the comic image. The data object may include the audio portion for the comic image. A comic panel may be generated for the comic image that incorporates the data object using the determined first dimensions and the placement.
-
公开(公告)号:US10945041B1
公开(公告)日:2021-03-09
申请号:US16890940
申请日:2020-06-02
Applicant: Amazon Technologies, Inc.
Inventor: Tamojit Chatterjee , Mayank Sharma , Muhammad Raffay Hamid , Sandeep Joshi
IPC: H04N21/47 , G10L15/00 , G10L15/26 , H04N21/488
Abstract: Devices, systems, and methods are provided for language-agnostic subtitle drift detection and localization. A method may include extracting audio from video, dividing the audio into overlapping blocks, and determining the probabilities of overlapping portions of the blocks, the probabilities indicating a presence of voice data represented by the audio in the blocks. The method may generate machine blocks using overlapping portions of blocks where voice data is present, and may map the machine blocks to corresponding blocks indicating that subtitles are available for the video. For mapped blocks, the method may include determining features such as when subtitles are available without voice audio, when voice audio is available without subtitles, and when voice audio and subtitles both are available. Using the features, the method may include determining the probability that the video includes subtitle drift, and the method may include analyzing the video to localize where the subtitle drift occurs.
-
公开(公告)号:US20240346686A1
公开(公告)日:2024-10-17
申请号:US18749025
申请日:2024-06-20
Applicant: Amazon Technologies, Inc.
Inventor: Xiaohan Nie , Michael Thomas Pecchia , Leo Chan , Ahmed Aly Saad Ahmed , Muhammad Raffay Hamid , Sheng Liu
Abstract: Systems, devices, and methods are provided for depth-guided structure from motion. A system may obtain a plurality of image frames from a digital content item that corresponds to a scene and determine, based at least in part on a correspondence search, a set of 2-D keypoints for the plurality of image frames. A depth estimator may be used to determine a plurality of dense depth map for the plurality of image frames. The set of 2-D keypoints and the plurality of dense depth maps may be used to determine a corresponding set of depth priors. Initialization and/or depth-regularized optimization may be performed using the keypoints and depth priors.
-
公开(公告)号:US12056949B1
公开(公告)日:2024-08-06
申请号:US17215816
申请日:2021-03-29
Applicant: Amazon Technologies, Inc.
Inventor: Xiaohang Sun , Mohamed Kamal Omar , Alexander Ratnikov , Ahmed Aly Saad Ahmed , Tai-Ching Li , Travis Silvers , Hanxiao Deng , Muhammad Raffay Hamid , Ivan Ryndin
CPC classification number: G06V40/10 , G06F18/2178 , G06N3/08 , G06V20/46
Abstract: Techniques are disclosed for detecting an uncovered portion of a body of a person in a frame of video content. In an example, a first machine learning model of a computing system may output a first score for the frame based on a map that identifies a region of the frame associated with an uncovered body part type. Depending on a value of the first score, a second machine learning model that includes a neural network architecture may further analyze the frame to output a second score. The first score and second score may be merged to produce a third score for the frame. A plurality of scores may be determined, respectively, for frames of the video content, and a maximum score may be selected. The video content may be selected for presentation on a display for further evaluation based on the maximum score.
-
公开(公告)号:US11928880B1
公开(公告)日:2024-03-12
申请号:US17215816
申请日:2021-03-29
Applicant: Amazon Technologies, Inc.
Inventor: Xiaohang Sun , Mohamed Kamal Omar , Alexander Ratnikov , Ahmed Aly Saad Ahmed , Tai-Ching Li , Travis Silvers , Hanxiao Deng , Muhammad Raffay Hamid , Ivan Ryndin
CPC classification number: G06V40/10 , G06F18/2178 , G06N3/08 , G06V20/46
Abstract: Techniques are disclosed for detecting an uncovered portion of a body of a person in a frame of video content. In an example, a first machine learning model of a computing system may output a first score for the frame based on a map that identifies a region of the frame associated with an uncovered body part type. Depending on a value of the first score, a second machine learning model that includes a neural network architecture may further analyze the frame to output a second score. The first score and second score may be merged to produce a third score for the frame. A plurality of scores may be determined, respectively, for frames of the video content, and a maximum score may be selected. The video content may be selected for presentation on a display for further evaluation based on the maximum score.
-
公开(公告)号:US11900700B2
公开(公告)日:2024-02-13
申请号:US18175044
申请日:2023-02-27
Applicant: Amazon Technologies, Inc.
Inventor: Tamojit Chatterjee , Mayank Sharma , Muhammad Raffay Hamid , Sandeep Joshi
CPC classification number: G06V20/635 , G06F40/169 , G06N7/01 , G06V20/40 , G11B27/10 , G06V20/44
Abstract: Systems, methods, and computer-readable media are disclosed for language-agnostic subtitle drift detection and correction. A method may include determining subtitles and/or captions from media content (e.g., videos), the subtitles and/or captions corresponding to dialog in the media content. The subtitles may be broken up into segments which may be analyzed to determine a likelihood of drift (e.g., a likelihood that the subtitles are out of synchronization with the dialog in the media content) for each segment. For segments with a high likelihood of drift, the subtitles may be incrementally adjusted to determine an adjustment that eliminates and/or reduces the amount of drift, and the drift in the segment may be corrected based on the drift amount detected. A linear regression model and/or human blocks determined by human operators may be used to otherwise optimize drift correction.
-
公开(公告)号:US20240029278A1
公开(公告)日:2024-01-25
申请号:US18481179
申请日:2023-10-04
Applicant: Amazon Technologies, Inc.
Inventor: Xiaohan Nie , Muhammad Raffay Hamid
CPC classification number: G06T7/337 , G06T7/73 , G06V20/46 , G06V10/751 , H04N5/272 , G06F18/41 , G06F18/214 , G06F18/2163 , G06F18/2178 , G06T2207/10016 , G06T2207/20021 , G06T2207/20081 , G06T2207/20092 , G06T2207/30204 , G06T2207/30228 , H04N21/2187
Abstract: Methods and systems are described for registering a sports field to a video. Video of a live event may feature participants at a venue. A template of the venue, including virtual markings that represent real markings on the venue, may be obtained. A homographic transformation between an image plane and a ground plane may be determined by matching virtual markings to corresponding real markings captured in at least one frame of the video. The determined homographic transformation may be used in the automated analysis of sports statistics and in improving inserted annotations and visualizations.
-
公开(公告)号:US11763564B1
公开(公告)日:2023-09-19
申请号:US17216147
申请日:2021-03-29
Applicant: Amazon Technologies, Inc.
Inventor: Najmeh Sadoughi Nourabadi , Kewen Chen , Tu Anh Ho , Christina Botkins , Dongqing Zhang , Muhammad Raffay Hamid
IPC: G06V20/40 , G06F16/71 , G06F16/75 , G06F16/735 , G06N20/00
Abstract: Systems and methods are provided herein for generating optimized video segments. A derivative video segment (e.g., a scene) can be identified from derivative video content (e.g., a movie trailer). The segment may be used a query to search video content (e.g., the movie) for the segment. Once found, an optimized video segment may be generated from the video content. The optimized video segment may have a different start time and/or end time than those corresponding to the original segment. Once optimized, the video segment may be presented to a user or stored for subsequent content recommendations.
-
公开(公告)号:US11734930B1
公开(公告)日:2023-08-22
申请号:US17662608
申请日:2022-05-09
Applicant: Amazon Technologies, Inc.
Inventor: Kewen Chen , Tu Anh Ho , Muhammad Raffay Hamid , Shixing Chen
CPC classification number: G06V20/46 , G06V20/49 , G06V40/165 , G06V40/168
Abstract: Methods and apparatus are described for generating compelling preview clips of media presentations. Compelling clips are identified based on the extent to which human faces are shown and/or the loudness of the audio associated with the clips. One or more of these compelling clips are then provided to a client device for playback.
-
-
-
-
-
-
-
-
-