USER REPRESENTATION USING DEPTHS RELATIVE TO MULTIPLE SURFACE POINTS

    公开(公告)号:US20240005537A1

    公开(公告)日:2024-01-04

    申请号:US18214604

    申请日:2023-06-27

    Applicant: Apple Inc.

    CPC classification number: G06T7/521 G06V40/176 G06T13/40

    Abstract: Various implementations disclosed herein include devices, systems, and methods that generates values for a representation of a face of a user. For example, an example process may include obtaining sensor data (e.g., live data) of a user, wherein the sensor data is associated with a point in time, generating a set of values representing the user based on the sensor data, and providing the set of values, where a depiction of the user at the point in time is displayed based on the set of values. In some implementations, the set of values includes depth values that define three-dimensional (3D) positions of portions of the user relative to multiple 3D positions of points of a projected surface and appearance values (e.g., color, texture, opacity, etc.) that define appearances of the portions of the user.

    Neural face video compression using multiple views

    公开(公告)号:US11856203B1

    公开(公告)日:2023-12-26

    申请号:US17701498

    申请日:2022-03-22

    Applicant: Apple Inc.

    CPC classification number: H04N19/139 H04N19/132 H04N19/42

    Abstract: Advances in deep generative models (DGM) have led to the development of neural face video compression codecs that are capable of using an order of magnitude less data than “traditional” engineered codecs. These “neural” codecs can reconstruct a target image by warping a source image to approximate the content of the target image and using a DGM to compensate for imperfections in the warped source image. The determined warping operation may be encoded and transmitted using less data (e.g., transmitting a small number of keypoints, rather than a dense flow field), leading to the bandwidth savings compared to traditional codecs. However, by relying on a single source image only, these methods can lead to inaccurate reconstructions. The techniques presented herein improve image reconstruction quality while maintaining bandwidth savings, via a combination of using multiple source images (i.e., containing multiple views of the first human subject) and novel feature aggregation techniques.

    Generating animated three-dimensional models from captured images

    公开(公告)号:US11068698B2

    公开(公告)日:2021-07-20

    申请号:US16586758

    申请日:2019-09-27

    Applicant: Apple Inc.

    Abstract: A three-dimensional model (e.g., motion capture model) of a user is generated from captured images or captured video of the user. A machine learning network may track poses and expressions of the user to generate and refine the three-dimensional model. Refinement of the three-dimensional model may provide more accurate tracking of the user's face. Refining of the three-dimensional model may include refining the determinations of poses and expressions at defined locations (e.g., eye corners and/or nose) in the three-dimensional model. The refining may occur in an iterative process. Tracking of the three-dimensional model over time (e.g., during video capture) may be used to generate an animated three-dimensional model (e.g., an animated puppet) of the user that simulates the user's poses and expressions.

    Deformable object tracking
    8.
    发明授权

    公开(公告)号:US11379996B2

    公开(公告)日:2022-07-05

    申请号:US16761582

    申请日:2018-11-13

    Applicant: APPLE INC.

    Abstract: Various implementations disclosed herein include devices, systems, and methods that use event camera data to track deformable objects such as faces, hands, and other body parts. One exemplary implementation involves receiving a stream of pixel events output by an event camera. The device tracks the deformable object using this data. Various implementations do so by generating a dynamic representation of the object and modifying the dynamic representation of the object in response to obtaining additional pixel events output by the event camera. In some implementations, generating the dynamic representation of the object involves identifying features disposed on the deformable surface of the object using the stream of pixel events. The features are determined by identifying patterns of pixel events. As new event stream data is received, the patterns of pixel events are recognized in the new data and used to modify the dynamic representation of the object.

    Generating animated three-dimensional models from captured images

    公开(公告)号:US10430642B2

    公开(公告)日:2019-10-01

    申请号:US15934521

    申请日:2018-03-23

    Applicant: Apple Inc.

    Abstract: A three-dimensional model (e.g., motion capture model) of a user is generated from captured images or captured video of the user. A machine learning network may track poses and expressions of the user to generate and refine the three-dimensional model. Refinement of the three-dimensional model may provide more accurate tracking of the user's face. Refining of the three-dimensional model may include refining the determinations of poses and expressions at defined locations (e.g., eye corners and/or nose) in the three-dimensional model. The refining may occur in an iterative process. Tracking of the three-dimensional model over time (e.g., during video capture) may be used to generate an animated three-dimensional model (e.g., an animated puppet) of the user that simulates the user's poses and expressions.

Patent Agency Ranking