CONVOLUTIONAL STRUCTURED STATE SPACE MODEL
    3.
    发明公开

    公开(公告)号:US20240127041A1

    公开(公告)日:2024-04-18

    申请号:US18452714

    申请日:2023-08-21

    CPC classification number: G06N3/0464 G06F17/16 G06N3/049

    Abstract: Systems and methods are disclosed related to a convolutional structured state space model (ConvSSM), which has a tensor-structured state but a continuous-time parameterization and linear state updates. The linearity may be exploited to use parallel scans for subquadratic parallelization across the spatiotemporal sequence. The ConvSSM effectively models long-range dependencies and, when followed by a nonlinear operation forms a spatiotemporal layer (ConvS5) that does not require compressing frames into tokens, can be efficiently parallelized across the sequence, provides an unbounded context, and enables fast autoregressive generation.

    ESTIMATING FACIAL EXPRESSIONS USING FACIAL LANDMARKS

    公开(公告)号:US20230144458A1

    公开(公告)日:2023-05-11

    申请号:US18051209

    申请日:2022-10-31

    CPC classification number: G06V40/174 G06V40/171 G06V40/165 G06V10/82 G06T13/40

    Abstract: In examples, locations of facial landmarks may be applied to one or more machine learning models (MLMs) to generate output data indicating profiles corresponding to facial expressions, such as facial action coding system (FACS) values. The output data may be used to determine geometry of a model. For example, video frames depicting one or more faces may be analyzed to determine the locations. The facial landmarks may be normalized, then be applied to the MLM(s) to infer the profile(s), which may then be used to animate the mode for expression retargeting from the video. The MLM(s) may include sub-networks that each analyze a set of input data corresponding to a region of the face to determine profiles that correspond to the region. The profiles from the sub-networks, along global locations of facial landmarks may be used by a subsequent network to infer the profiles for the overall face.

    SINGLE-IMAGE INVERSE RENDERING
    6.
    发明申请

    公开(公告)号:US20230081641A1

    公开(公告)日:2023-03-16

    申请号:US17551046

    申请日:2021-12-14

    Abstract: A single two-dimensional (2D) image can be used as input to obtain a three-dimensional (3D) representation of the 2D image. This is done by extracting features from the 2D image by an encoder and determining a 3D representation of the 2D image utilizing a trained 2D convolutional neural network (CNN). Volumetric rendering is then run on the 3D representation to combine features within one or more viewing directions, and the combined features are provided as input to a multilayer perceptron (MLP) that predicts and outputs color (or multi-dimensional neural features) and density values for each point within the 3D representation. As a result, single-image inverse rendering may be performed using only a single 2D image as input to create a corresponding 3D representation of the scene in the single 2D image.

    Learning affinity via a spatial propagation neural network

    公开(公告)号:US10762425B2

    公开(公告)日:2020-09-01

    申请号:US16134716

    申请日:2018-09-18

    Abstract: A spatial linear propagation network (SLPN) system learns the affinity matrix for vision tasks. An affinity matrix is a generic matrix that defines the similarity of two points in space. The SLPN system is trained for a particular computer vision task and refines an input map (i.e., affinity matrix) that indicates pixels the share a particular property (e.g., color, object, texture, shape, etc.). Inputs to the SLPN system are input data (e.g., pixel values for an image) and the input map corresponding to the input data to be propagated. The input data is processed to produce task-specific affinity values (guidance data). The task-specific affinity values are applied to values in the input map, with at least two weighted values from each column contributing to a value in the refined map data for the adjacent column.

    NEURAL HEAD AVATAR CONSTRUCTION FROM AN IMAGE

    公开(公告)号:US20240404174A1

    公开(公告)日:2024-12-05

    申请号:US18653723

    申请日:2024-05-02

    Abstract: Systems and methods are disclosed that animate a source portrait image with motion (i.e., pose and expression) from a target image. In contrast to conventional systems, given an unseen single-view portrait image, an implicit three-dimensional (3D) head avatar is constructed that not only captures photo-realistic details within and beyond the face region, but also is readily available for animation without requiring further optimization during inference. In an embodiment, three processing branches of a system produce three tri-planes representing coarse 3D geometry for the head avatar, detailed appearance of a source image, as well as the expression of a target image. By applying volumetric rendering to a combination of the three tri-planes, an image of the desired identity, expression and pose is generated.

    SYNTHETIC DATASET GENERATOR
    10.
    发明公开

    公开(公告)号:US20240127075A1

    公开(公告)日:2024-04-18

    申请号:US18212629

    申请日:2023-06-21

    CPC classification number: G06N3/0985

    Abstract: Machine learning is a process that learns a model from a given dataset, where the model can then be used to make a prediction about new data. In order to reduce the costs associated with collecting and labeling real world datasets for use in training the model, computer processes can synthetically generate datasets which simulate real world data. The present disclosure improves the effectiveness of such synthetic datasets for training machine learning models used in real world applications, in particular by generating a synthetic dataset that is specifically targeted to a specified downstream task (e.g. a particular computer vision task, a particular natural language processing task, etc.).

Patent Agency Ranking