SYSTEMS AND METHODS FOR SUBJECT-DRIVEN IMAGE GENERATION

    公开(公告)号:US20240161369A1

    公开(公告)日:2024-05-16

    申请号:US18498768

    申请日:2023-10-31

    CPC classification number: G06T11/60 G06T9/00 G06V10/761 G06V10/82

    Abstract: Embodiments described herein provide systems and methods of subject-driven image generation. In at least one embodiment, a system receives, via a data interface, an image containing a subject, a text description of the subject in the image, and a text prompt relating to a different rendition of the subject. The system encodes, via an image encoder, the image into an image feature vector. The system encodes, via a text encoder, the text description int a text feature vector. The system generates, by a multimodal encoder, a vector representation of the subject based on the image feature vector and the text feature vector. The system generates, by a neural network based image generation model, an output image based on an input combining the text prompt and the vector representation.

    SYSTEMS AND METHODS FOR A DISTRIBUTED TRAINING FRAMEWORK USING UNIFORM CLASS PROTOTYPES

    公开(公告)号:US20240054350A1

    公开(公告)日:2024-02-15

    申请号:US18064122

    申请日:2022-12-09

    CPC classification number: G06N3/098

    Abstract: Embodiments described herein provide systems and methods for federated learning. A central system may store a neural network model which has a body of a number of layers, and a classification layer comprising class prototypes which classifies the latent representations output by the body of the model. The central system may initialize the class prototypes so that they are uniformly distributed in the representation space. The model and class prototypes may be broadcast to a number of client systems, which update the body of the model locally while keeping the class prototypes fixed. The clients may return information to the central system including updated local model parameters, and a local representation of the classes based on the latent representation of items in the local training data. Based on the information from the clients, the neural network model may be updated. This process may be repeated iteratively.

    SYSTEMS AND METHODS FOR MULTI-MODAL LANGUAGE MODELS

    公开(公告)号:US20240370718A1

    公开(公告)日:2024-11-07

    申请号:US18400477

    申请日:2023-12-29

    Abstract: Embodiments described herein provide a method of generating a multi-modal task output to a text instruction relating to inputs of multiple different modalities (e.g., text, audio, video, 3D). The method comprises receiving, via a data interface, a first input of a first modality, a second input of a second modality and the text instruction relating to the first and the second inputs; encoding, by a first multimodal encoder adapted for the first modality, the first input of the first modality into a first encoded representation conditioned on the text instruction; encoding, by a second multimodal encoder adapted for the second modality, the second input of the second modality into a second encoded representation conditioned on the text instruction; and generating, by a neural network based language model, the multi-modal task output based on an input combining the first encoded representation, the second encoded representation, and the text instruction.

    SYSTEMS AND METHODS FOR MULTIMODAL PRETRAINING FOR THREE-DIMENSIONAL UNDERSTANDING MODELS

    公开(公告)号:US20240312128A1

    公开(公告)日:2024-09-19

    申请号:US18493035

    申请日:2023-10-24

    CPC classification number: G06T17/00 G06F40/40

    Abstract: A method of training a neural network based three-dimensional (3D) encoder is provided. A first plurality of samples of a training dataset are generated using a first 3D model. An image generator with multi-view rendering is used to generate a plurality of two-dimensional (2D) images having different viewpoints of the first 3D model. A first language model is used to generate a plurality of texts corresponding to the plurality of 2D images respectively. A first text for a first image is generated by using one or more text descriptions generated by the first language model. A point cloud is generated by randomly sampling points in the 3D model. The first plurality of samples are generated using the plurality of 2D images, the corresponding plurality of texts, and the point cloud. The neural network based 3D encoder is trained using the training dataset including the first plurality of samples.

Patent Agency Ranking