-
公开(公告)号:US20200372710A1
公开(公告)日:2020-11-26
申请号:US16985402
申请日:2020-08-05
Applicant: Adobe, Inc.
Inventor: Oliver Wang , Vladimir Kim , Matthew Fisher , Elya Shechtman , Chen-Hsuan Lin , Bryan Russell
Abstract: Techniques are disclosed for 3D object reconstruction using photometric mesh representations. A decoder is pretrained to transform points sampled from 2D patches of representative objects into 3D polygonal meshes. An image frame of the object is fed into an encoder to get an initial latent code vector. For each frame and camera pair from the sequence, a polygonal mesh is rendered at the given viewpoints. The mesh is optimized by creating a virtual viewpoint, rasterized to obtain a depth map. The 3D mesh projections are aligned by projecting the coordinates corresponding to the polygonal face vertices of the rasterized mesh to both selected viewpoints. The photometric error is determined from RGB pixel intensities sampled from both frames. Gradients from the photometric error are backpropagated into the vertices of the assigned polygonal indices by relating the barycentric coordinates of each image to update the latent code vector.
-
公开(公告)号:US20240419726A1
公开(公告)日:2024-12-19
申请号:US18210535
申请日:2023-06-15
Applicant: Adobe Inc.
Inventor: Simon Jenni , Fabian David Caba Heilbron , Chun-Hsiao Yeh , Bryan Russell , Josef Sivic
IPC: G06F16/58 , G06F16/535 , G06F16/538
Abstract: Techniques for learning to personalize vision-language models through meta-personalization are described. In one embodiment, one or more processing devices lock a pre-trained vision-language model (VLM) during a training phase. The processing devices train the pre-trained VLM to augment a text encoder of the pre-trained VLM with a set of general named video instances to form a meta-personalized VLM, the meta-personalized VLM to include global category features. The processing devices test the meta-personalized VLM to adapt the text encoder with a set of personal named video instances to form a personal VLM, the personal VLM comprising the global category features personalized with a set of personal instance weights to form a personal instance token associated with the user. Other embodiments are described and claimed.
-
公开(公告)号:US11721056B2
公开(公告)日:2023-08-08
申请号:US17573890
申请日:2022-01-12
Applicant: Adobe Inc.
Inventor: Jimei Yang , Davis Rempe , Bryan Russell , Aaron Hertzmann
Abstract: In some embodiments, a model training system obtains a set of animation models. For each of the animation models, the model training system renders the animation model to generate a sequence of video frames containing a character using a set of rendering parameters and extracts joint points of the character from each frame of the sequence of video frames. The model training system further determines, for each frame of the sequence of video frames, whether a subset of the joint points are in contact with a ground plane in a three-dimensional space and generates contact labels for the subset of the joint points. The model training system trains a contact estimation model using training data containing the joint points extracted from the sequences of video frames and the generated contact labels. The contact estimation model can be used to refine a motion model for a character.
-
公开(公告)号:US20210409836A1
公开(公告)日:2021-12-30
申请号:US17470441
申请日:2021-09-09
Applicant: Adobe Inc.
Inventor: Bryan Russell , Ruppesh Nalwaya , Markus Woodson , Joon-Young Lee , Hailin Jin
IPC: H04N21/81 , H04N21/845 , G06N3/08 , G06K9/00
Abstract: Systems, methods, and non-transitory computer-readable media are disclosed for automatic tagging of videos. In particular, in one or more embodiments, the disclosed systems generate a set of tagged feature vectors (e.g., tagged feature vectors based on action-rich digital videos) to utilize to generate tags for an input digital video. For instance, the disclosed systems can extract a set of frames for the input digital video and generate feature vectors from the set of frames. In some embodiments, the disclosed systems generate aggregated feature vectors from the feature vectors. Furthermore, the disclosed systems can utilize the feature vectors (or aggregated feature vectors) to identify similar tagged feature vectors from the set of tagged feature vectors. Additionally, the disclosed systems can generate a set of tags for the input digital videos by aggregating one or more tags corresponding to identified similar tagged feature vectors.
-
公开(公告)号:US20210335028A1
公开(公告)日:2021-10-28
申请号:US16860411
申请日:2020-04-28
Applicant: Adobe Inc.
Inventor: Jimei Yang , Davis Rempe , Bryan Russell , Aaron Hertzmann
Abstract: In some embodiments, a motion model refinement system receives an input video depicting a human character and an initial motion model describing motions of individual joint points of the human character in a three-dimensional space. The motion model refinement system identifies foot joint points of the human character that are in contact with a ground plane using a trained contact estimation model. The motion model refinement system determines the ground plane based on the foot joint points and the initial motion model and constructs an optimization problem for refining the initial motion model. The optimization problem minimizes the difference between the refined motion model and the initial motion model under a set of plausibility constraints including constraints on the contact foot joint points and a time-dependent inertia tensor-based constraint. The motion model refinement system obtains the refined motion model by solving the optimization problem.
-
公开(公告)号:US10769848B1
公开(公告)日:2020-09-08
申请号:US16421729
申请日:2019-05-24
Applicant: Adobe, Inc.
Inventor: Oliver Wang , Vladimir Kim , Matthew Fisher , Elya Shechtman , Chen-Hsuan Lin , Bryan Russell
Abstract: Techniques are disclosed for 3D object reconstruction using photometric mesh representations. A decoder is pretrained to transform points sampled from 2D patches of representative objects into 3D polygonal meshes. An image frame of the object is fed into an encoder to get an initial latent code vector. For each frame and camera pair from the sequence, a polygonal mesh is rendered at the given viewpoints. The mesh is optimized by creating a virtual viewpoint, rasterized to obtain a depth map. The 3D mesh projections are aligned by projecting the coordinates corresponding to the polygonal face vertices of the rasterized mesh to both selected viewpoints. The photometric error is determined from RGB pixel intensities sampled from both frames. Gradients from the photometric error are backpropagated into the vertices of the assigned polygonal indices by relating the barycentric coordinates of each image to update the latent code vector.
-
27.
公开(公告)号:US20200175759A1
公开(公告)日:2020-06-04
申请号:US16205132
申请日:2018-11-29
Applicant: Adobe Inc.
Inventor: Bryan Russell , Daniel Kaufman , Carlo Innamorati , Niloy Mitra
Abstract: This application relates generally to augmenting images and videos with dynamic object compositing, and more specifically, to generating synthetic training data to train a machine learning model to automatically augment an image or video with a dynamic object. The synthetic training data may contain multiple data points from thousands of simulated dynamic object movements within a virtual environment. Based on the synthetic training data, the machine learning model may determine the movement of a new dynamic object within new virtual environment.
-
公开(公告)号:US10290112B2
公开(公告)日:2019-05-14
申请号:US15996833
申请日:2018-06-04
Applicant: ADOBE INC.
Inventor: Xiaohui Shen , Scott Cohen , Peng Wang , Bryan Russell , Brian Price , Jonathan Eisenmann
Abstract: Techniques for planar region-guided estimates of 3D geometry of objects depicted in a single 2D image. The techniques estimate regions of an image that are part of planar regions (i.e., flat surfaces) and use those planar region estimates to estimate the 3D geometry of the objects in the image. The planar regions and resulting 3D geometry are estimated using only a single 2D image of the objects. Training data from images of other objects is used to train a CNN with a model that is then used to make planar region estimates using a single 2D image. The planar region estimates, in one example, are based on estimates of planarity (surface plane information) and estimates of edges (depth discontinuities and edges between surface planes) that are estimated using models trained using images of other scenes.
-
-
-
-
-
-
-