SYSTEMS AND METHODS FOR GESTURE GENERATION

    公开(公告)号:US20250166274A1

    公开(公告)日:2025-05-22

    申请号:US18953359

    申请日:2024-11-20

    Abstract: Embodiments described herein include a hybrid gesture generation model using a sentence encoder. By using a pre-trained model as the sentence encoder, the framework may output an embedding for a sentence containing any word, even if it is not in the training set, Further, embodiments described herein include a hybrid gesture model that combines trained co-speech gesture generation with a retrieval of pre-defined special gestures. The generation part uses text input to a model trained with co-speech gesture data. The retrieval part uses pre-defined gestures for six different situations that have been prepared in advance. Using embodiments described herein, an AI avatar can perform special gestures like greeting or shaking hands in predefined specific situations, and co-speech gestures in other conversational situations.

    SYSTEMS AND METHODS FOR TEXT-TO-SPEECH SYNTHESIS

    公开(公告)号:US20240339104A1

    公开(公告)日:2024-10-10

    申请号:US18598996

    申请日:2024-03-07

    CPC classification number: G10L13/047 G10L15/16

    Abstract: Embodiments described herein provide systems and methods for text to speech synthesis. A system receives, via a data interface, an input text, a reference spectrogram, and at least one of an emotion ID or speaker ID. The system generates, via a first encoder, a vector representation of the input text. The system generates, via a second encoder, a vector representation of the reference spectrogram. The system generates, via a variance adaptor, a modified vector representation based on a combined representation including a combination of the vector representation of the input text, the vector representation of the reference spectrogram, and at least one of an embedding of the emotion ID or an embedding of the speaker ID. The system generates, via a decoder, an audio waveform based on the modified vector representation. The generated audio waveform may be played via a speaker.

    SYSTEMS AND METHODS FOR AI ENABLED DELIVERY OF USER SPECIFIC SERVICES

    公开(公告)号:US20240354382A1

    公开(公告)日:2024-10-24

    申请号:US18436497

    申请日:2024-02-08

    CPC classification number: G06F21/31 G06F21/6254

    Abstract: Embodiments described herein provide systems and methods for an AI native operating system wrapper. Methods may include receiving, by a computing device via a user interface, a user input associated with an application; receiving, by the computing device via a data interface, stored information associated with the user; determining, via an artificial intelligence (AI) model based on the user input and the stored information, one or more actions, performing the one or more actions on the application; and transmitting output from the application to the user interface.

    SYSTEMS AND METHODS FOR AVATAR-BASED INTERACTIONS

    公开(公告)号:US20240339211A1

    公开(公告)日:2024-10-10

    申请号:US18582404

    申请日:2024-02-20

    Inventor: Patrick Nunally

    Abstract: Embodiments described herein provide systems and methods for avatar-based interactions. A system receives, via a user interface device, a first user input including one or more of an audio input, a text input, or a video input. The system generates, based on a trained model, a first response to the first user input. The system renders a virtual avatar model based on the first response. The system receives a second user input via the user interface device. The system determines, based on the second user input, to provide an advanced level of care including: control a communication link between the computing device and a credentialed service device, receive a second response to the second user input from the credentialed service device via the communication link, and render the virtual avatar model based on the second response.

    SYSTEMS AND METHODS FOR DYNAMIC FACIAL EXPRESSION RECOGNITION

    公开(公告)号:US20240338974A1

    公开(公告)日:2024-10-10

    申请号:US18604104

    申请日:2024-03-13

    CPC classification number: G06V40/176 G06V10/764 G06V10/82

    Abstract: Embodiments described herein provide systems and methods for facial expression recognition (FER). Embodiments herein combine features of different semantic levels and classifies both sentiment and specific emotion categories with emotion grouping. Embodiments herein include a model with a bottom-up branch that learns facial expressions representation at different semantic levels and output pseudo labels of facial expressions for each frame using a 2D FER model, and a top-down branch that learns discriminative representations by combining feature vectors of each semantic level for recognizing facial expressions at the corresponding emotion group.

    SYSTEMS AND METHODS FOR 3D-AWARE IMAGE GENERATION

    公开(公告)号:US20240338878A1

    公开(公告)日:2024-10-10

    申请号:US18625491

    申请日:2024-04-03

    CPC classification number: G06T15/00 G06T11/001

    Abstract: Embodiments described herein provide systems and methods for 3D-aware image generation. A system receives, via a data interface, a plurality of control parameters and a view direction. The system generates a plurality of predicted densities based on a plurality of positions and the plurality of control parameters. The densities may be predicted by applying a series of modulation blocks, wherein each block modulates a vector representation based on control parameters that are used to generate frequency values and phase shift values for the modulation. The system generates an image based on the plurality of predicted densities and the view direction.

    SYSTEMS AND METHODS FOR MULTI-PARTY TRANSACTIONS

    公开(公告)号:US20240338723A1

    公开(公告)日:2024-10-10

    申请号:US18627298

    申请日:2024-04-04

    Inventor: Patrick Nunally

    CPC classification number: G06Q30/0207 G06Q20/24 G06Q20/38215

    Abstract: Embodiments described herein provide systems and methods for multi-party transactions. A user device may receive offers for conditional credits from an offering device, based on user information. The user device may accept the offer, which causes the offering device to add the conditional credit to a digital ledger. The conditional credit may then be used in a transaction where the issuing bank fulfils a portion of the transaction and the remaining portion of the transaction is fulfilled by the conditional credit (when the criteria of the conditional credit are met).

    SYSTEMS AND METHODS FOR GESTURE GENERATION FROM TEXT AND NON-SPEECH

    公开(公告)号:US20240338560A1

    公开(公告)日:2024-10-10

    申请号:US18626959

    申请日:2024-04-04

    CPC classification number: G06N3/08 G06N3/0455

    Abstract: Embodiments described herein provide systems and methods for gesture generation from multimodal input. A method includes receiving a multimodal input. The method may further include masking a subset of the multimodal input; generating, via an embedder, a multimodal embedding based on the masked multimodal input; generating, via an encoder, multimodal features based on the multimodal embedding, wherein the encoder includes one or more attention layers connecting different modalities; generating, via a generator, multimodal output based on the multimodal features; computing a loss based on the multimodal input and the multimodal output. The method may further include updating parameters of the encoder based on the loss.

    SYSTEMS AND METHODS FOR SPEECH GENERATION BY EMOTIONAL VOICE CONVERSION

    公开(公告)号:US20250166602A1

    公开(公告)日:2025-05-22

    申请号:US18953970

    申请日:2024-11-20

    Abstract: Embodiments described herein include voice conversion (VC) based Emotion Data generation. Embodiments described herein may generate a multi-speaker multi-emotion dataset by changing the gender style of the input speech while retaining its emotion style and linguistic contents. For example, a single-speaker multi-emotion dataset may be used as the input speech and a multi-speaker single emotion dataset may be the target speech. The generated data may be used as the training data for a text to speech (TTS) model so that it can generate speeches with diverse styles of emotions and speakers. To generate a multi-speaker multi-emotion dataset, embodiments herein add an emotion encoder to a VC model and use acoustic properties to preserve the emotion speech style of the input speech while changing just the gender style to the target gender style.

Patent Agency Ranking