-
公开(公告)号:US20200336802A1
公开(公告)日:2020-10-22
申请号:US16386031
申请日:2019-04-16
Applicant: Adobe Inc.
Inventor: Bryan Russell , Ruppesh Nalwaya , Markus Woodson , Joon-Young Lee , Hailin Jin
IPC: H04N21/81 , G06K9/00 , G06N3/08 , H04N21/845
Abstract: Systems, methods, and non-transitory computer-readable media are disclosed for automatic tagging of videos. In particular, in one or more embodiments, the disclosed systems generate a set of tagged feature vectors (e.g., tagged feature vectors based on action-rich digital videos) to utilize to generate tags for an input digital video. For instance, the disclosed systems can extract a set of frames for the input digital video and generate feature vectors from the set of frames. In some embodiments, the disclosed systems generate aggregated feature vectors from the feature vectors. Furthermore, the disclosed systems can utilize the feature vectors (or aggregated feature vectors) to identify similar tagged feature vectors from the set of tagged feature vectors. Additionally, the disclosed systems can generate a set of tags for the input digital videos by aggregating one or more tags corresponding to identified similar tagged feature vectors.
-
公开(公告)号:US10810435B2
公开(公告)日:2020-10-20
申请号:US16183560
申请日:2018-11-07
Applicant: Adobe Inc.
Inventor: Joon-Young Lee , Seoungwug Oh , Ning Xu
Abstract: In implementations of segmenting objects in video sequences, user annotations designate an object in any image frame of a video sequence, without requiring user annotations for all image frames. An interaction network generates a mask for an object in an image frame annotated by a user, and is coupled both internally and externally to a propagation network that propagates the mask to other image frames of the video sequence. Feature maps are aggregated for each round of user annotations and couple the interaction network and the propagation network internally. The interaction network and the propagation network are trained jointly using synthetic annotations in a multi-round training scenario, in which weights of the interaction network and the propagation network are adjusted after multiple synthetic annotations are processed, resulting in a trained object segmentation system that can reliably generate realistic object masks.
-
公开(公告)号:US20250088650A1
公开(公告)日:2025-03-13
申请号:US18367377
申请日:2023-09-12
Applicant: Adobe Inc.
Inventor: Nikhil Kalra , Seoung Wug Oh , Nico Alexander Becherer , Joon-Young Lee , Jimei Yang
IPC: H04N19/436 , G11B27/031 , H04N19/119 , H04N19/136 , H04N19/172 , H04N19/85
Abstract: In one aspect, a processor determines a first set of video frames of a video based on a target video frame. The first set of video frames includes the target video frame, one or more frames of the video preceding the target video frame, and one or more frames of the video subsequent to the target video frame. The first set of video frames includes a sequence of video frames of the video. An encoder neural network executing on the processor encodes the first set of video frames of a video to generate a respective feature vector for each video frame in the first set. A decoder neural network executing on the processor decodes the feature vectors to generate a mask for the target video frame.
-
公开(公告)号:US20230196817A1
公开(公告)日:2023-06-22
申请号:US17552857
申请日:2021-12-16
Applicant: Adobe Inc.
Inventor: Seoung Wug Oh , Miran Heo , Joon-Young Lee
CPC classification number: G06V40/103 , G06T7/10 , G06T7/70 , G06V10/40 , G06T2207/20084 , G06T2207/30196
Abstract: The present disclosure relates to systems, methods, and non-transitory computer-readable media that generate joint-based segmentation masks for digital objects portrayed in digital videos. In particular, in one or more embodiments, the disclosed systems utilize a video masking model having a pose tracking neural network and a segmentation neural network to generate the joint-based segmentation masks. To illustrate, in some embodiments, the disclosed systems utilize the pose tracking neural network to identify a set of joints of the digital object across the frames of the digital video. The disclosed systems further utilize the segmentation neural network to generate joint-based segmentation masks for the video frames that portray the object using the identified joints. In some cases, the segmentation neural network includes a multi-layer perceptron mixer layer for mixing visual features propagated via convolutional layers.
-
公开(公告)号:US11640714B2
公开(公告)日:2023-05-02
申请号:US16852647
申请日:2020-04-20
Applicant: ADOBE INC.
Inventor: Joon-Young Lee , Sanghyun Woo , Dahun Kim
Abstract: Systems and methods for panoptic video segmentation are described. A method may include identifying a target frame and a reference frame from a video, generating target features for the target frame and reference features for the reference frame, combining the target features and the reference features to produce fused features for the target frame, generating a feature matrix comprising a correspondence between objects from the reference features and objects from the fused features; and generating panoptic segmentation information for the target frame based on the feature matrix.
-
公开(公告)号:US10991085B2
公开(公告)日:2021-04-27
申请号:US16372202
申请日:2019-04-01
Applicant: ADOBE INC.
Inventor: Qi Sun , Li-Yi Wei , Joon-Young Lee , Jonathan Eisenmann , Jinwoong Jung , Byungmoon Kim
Abstract: Embodiments herein describe a framework for classifying images. In some embodiments, it is determined whether an image includes synthetic image content. If it does, characteristics of the image are analyzed to determine if the image includes characteristics particular to panoramic images (e.g., possess a threshold equivalency of pixel values among the top and/or bottom boundaries of the image, or a difference between summed pixel values of the pixels comprising the right vertical boundary of the image and summed pixel values of the pixels comprising the left vertical boundary of the image being less than or equal to a threshold value). If the image includes characteristics particular to panoramic images, the image is classified as a synthetic panoramic image. If the image is determined to not include synthetic image content, a neural network is applied to the image and the image is classified as one of non-synthetic panoramic or non-synthetic non-panoramic.
-
公开(公告)号:US20200311901A1
公开(公告)日:2020-10-01
申请号:US16372202
申请日:2019-04-01
Applicant: ADOBE INC.
Inventor: Qi Sun , Li-Yi Wei , Joon-Young Lee , Jonathan Eisenmann , Jinwoong Jung , Byungmoon Kim
Abstract: Embodiments herein describe a framework for classifying images. In some embodiments, it is determined whether an image includes synthetic image content. If it does, characteristics of the image are analyzed to determine if the image includes characteristics particular to panoramic images (e.g., possess a threshold equivalency of pixel values among the top and/or bottom boundaries of the image, or a difference between summed pixel values of the pixels comprising the right vertical boundary of the image and summed pixel values of the pixels comprising the left vertical boundary of the image being less than or equal to a threshold value). If the image includes characteristics particular to panoramic images, the image is classified as a synthetic panoramic image. If the image is determined to not include synthetic image content, a neural network is applied to the image and the image is classified as one of non-synthetic panoramic or non-synthetic non-panoramic.
-
公开(公告)号:US20200241574A1
公开(公告)日:2020-07-30
申请号:US16262448
申请日:2019-01-30
Applicant: Adobe Inc.
Inventor: Zhe Lin , Xin Ye , Joon-Young Lee , Jianming Zhang
Abstract: Systems and techniques are described that provide for generalizable approach policy learning and implementation for robotic object approaching. Described techniques provide fast and accurate approaching of a specified object, or type of object, in many different environments. The described techniques enable a robot to receive an identification of an object or type of object from a user, and then navigate to the desired object, without further control from the user. Moreover, the approach of the robot to the desired object is performed efficiently, e.g., with a minimum number of movements. Further, the approach techniques may be used even when the robot is placed in a new environment, such as when the same type of object must be approached in multiple settings.
-
公开(公告)号:US10726313B2
公开(公告)日:2020-07-28
申请号:US15957419
申请日:2018-04-19
Applicant: Adobe Inc.
Inventor: Joon-Young Lee , Hailin Jin , Fabian David Caba Heilbron
Abstract: Various embodiments describe active learning methods for training temporal action localization models used to localize actions in untrimmed videos. A trainable active learning selection function is used to select unlabeled samples that can improve the temporal action localization model the most. The select unlabeled samples are then annotated and used to retrain the temporal action localization model. In some embodiment, the trainable active learning selection function includes a trainable performance prediction model that maps a video sample and a temporal action localization model to a predicted performance improvement for the temporal action localization model.
-
公开(公告)号:US10671855B2
公开(公告)日:2020-06-02
申请号:US15949935
申请日:2018-04-10
Applicant: Adobe Inc.
Inventor: Joon-Young Lee , Seoungwug Oh , Kalyan Krishna Sunkavalli
Abstract: Various embodiments describe video object segmentation using a neural network and the training of the neural network. The neural network both detects a target object in the current frame based on a reference frame and a reference mask that define the target object and propagates the segmentation mask of the target object for a previous frame to the current frame to generate a segmentation mask for the current frame. In some embodiments, the neural network is pre-trained using synthetically generated static training images and is then fine-tuned using training videos.
-
-
-
-
-
-
-
-
-