-
公开(公告)号:US20250166274A1
公开(公告)日:2025-05-22
申请号:US18953359
申请日:2024-11-20
Applicant: Datum Point Labs Inc.
Inventor: Seonghyeok Noh , Hanseok Ko , Bonhwa Ku
IPC: G06T13/40
Abstract: Embodiments described herein include a hybrid gesture generation model using a sentence encoder. By using a pre-trained model as the sentence encoder, the framework may output an embedding for a sentence containing any word, even if it is not in the training set, Further, embodiments described herein include a hybrid gesture model that combines trained co-speech gesture generation with a retrieval of pre-defined special gestures. The generation part uses text input to a model trained with co-speech gesture data. The retrieval part uses pre-defined gestures for six different situations that have been prepared in advance. Using embodiments described herein, an AI avatar can perform special gestures like greeting or shaking hands in predefined specific situations, and co-speech gestures in other conversational situations.
-
公开(公告)号:US20240339104A1
公开(公告)日:2024-10-10
申请号:US18598996
申请日:2024-03-07
Applicant: Datum Point Labs Inc.
Inventor: Jeongki Min , Bonhwa Ku , Hanseok Ko
IPC: G10L13/047 , G10L15/16
CPC classification number: G10L13/047 , G10L15/16
Abstract: Embodiments described herein provide systems and methods for text to speech synthesis. A system receives, via a data interface, an input text, a reference spectrogram, and at least one of an emotion ID or speaker ID. The system generates, via a first encoder, a vector representation of the input text. The system generates, via a second encoder, a vector representation of the reference spectrogram. The system generates, via a variance adaptor, a modified vector representation based on a combined representation including a combination of the vector representation of the input text, the vector representation of the reference spectrogram, and at least one of an embedding of the emotion ID or an embedding of the speaker ID. The system generates, via a decoder, an audio waveform based on the modified vector representation. The generated audio waveform may be played via a speaker.
-
公开(公告)号:US20240364967A1
公开(公告)日:2024-10-31
申请号:US18623834
申请日:2024-04-01
Applicant: Datum Point Labs Inc.
Inventor: Patrick Nunally
IPC: H04N21/45 , H04N21/458 , H04N21/488
CPC classification number: H04N21/4532 , H04N21/458 , H04N21/4882
Abstract: Embodiments described herein provide systems and methods for delivering user specific messages. A system receives a media feed including at least one of an audio feed or a video feed. The system determines, based on the media feed, a categorization of the media feed. The system determines, based on a set of information, a replacement media. The system overrides the media feed with the replacement media based on the categorization.
-
公开(公告)号:US20240354382A1
公开(公告)日:2024-10-24
申请号:US18436497
申请日:2024-02-08
Applicant: Datum Point Labs Inc.
Inventor: PATRICK NUNALLY , MICHAEL T. LUCAS , TYLER J. LUCK
CPC classification number: G06F21/31 , G06F21/6254
Abstract: Embodiments described herein provide systems and methods for an AI native operating system wrapper. Methods may include receiving, by a computing device via a user interface, a user input associated with an application; receiving, by the computing device via a data interface, stored information associated with the user; determining, via an artificial intelligence (AI) model based on the user input and the stored information, one or more actions, performing the one or more actions on the application; and transmitting output from the application to the user interface.
-
公开(公告)号:US20240339211A1
公开(公告)日:2024-10-10
申请号:US18582404
申请日:2024-02-20
Applicant: Datum Point Labs Inc.
Inventor: Patrick Nunally
IPC: G16H40/67 , G06F3/01 , G06F3/04815 , G06T13/00
CPC classification number: G16H40/67 , G06F3/017 , G06F3/04815 , G06T13/00 , G06T2200/24 , G06T2210/41
Abstract: Embodiments described herein provide systems and methods for avatar-based interactions. A system receives, via a user interface device, a first user input including one or more of an audio input, a text input, or a video input. The system generates, based on a trained model, a first response to the first user input. The system renders a virtual avatar model based on the first response. The system receives a second user input via the user interface device. The system determines, based on the second user input, to provide an advanced level of care including: control a communication link between the computing device and a credentialed service device, receive a second response to the second user input from the credentialed service device via the communication link, and render the virtual avatar model based on the second response.
-
公开(公告)号:US20240338974A1
公开(公告)日:2024-10-10
申请号:US18604104
申请日:2024-03-13
Applicant: Datum Point Labs Inc.
Inventor: Bokyeung Lee , Bonhwa Ku , Hanseok Ko
IPC: G06V40/16 , G06V10/764 , G06V10/82
CPC classification number: G06V40/176 , G06V10/764 , G06V10/82
Abstract: Embodiments described herein provide systems and methods for facial expression recognition (FER). Embodiments herein combine features of different semantic levels and classifies both sentiment and specific emotion categories with emotion grouping. Embodiments herein include a model with a bottom-up branch that learns facial expressions representation at different semantic levels and output pseudo labels of facial expressions for each frame using a 2D FER model, and a top-down branch that learns discriminative representations by combining feature vectors of each semantic level for recognizing facial expressions at the corresponding emotion group.
-
公开(公告)号:US20240338878A1
公开(公告)日:2024-10-10
申请号:US18625491
申请日:2024-04-03
Applicant: Datum Point Labs Inc.
Inventor: Jeong-gi Kwak , Hanseok Ko
CPC classification number: G06T15/00 , G06T11/001
Abstract: Embodiments described herein provide systems and methods for 3D-aware image generation. A system receives, via a data interface, a plurality of control parameters and a view direction. The system generates a plurality of predicted densities based on a plurality of positions and the plurality of control parameters. The densities may be predicted by applying a series of modulation blocks, wherein each block modulates a vector representation based on control parameters that are used to generate frequency values and phase shift values for the modulation. The system generates an image based on the plurality of predicted densities and the view direction.
-
公开(公告)号:US20240338723A1
公开(公告)日:2024-10-10
申请号:US18627298
申请日:2024-04-04
Applicant: Datum Point Labs Inc.
Inventor: Patrick Nunally
IPC: G06Q30/0207 , G06Q20/24 , G06Q20/38
CPC classification number: G06Q30/0207 , G06Q20/24 , G06Q20/38215
Abstract: Embodiments described herein provide systems and methods for multi-party transactions. A user device may receive offers for conditional credits from an offering device, based on user information. The user device may accept the offer, which causes the offering device to add the conditional credit to a digital ledger. The conditional credit may then be used in a transaction where the issuing bank fulfils a portion of the transaction and the remaining portion of the transaction is fulfilled by the conditional credit (when the criteria of the conditional credit are met).
-
公开(公告)号:US20240338560A1
公开(公告)日:2024-10-10
申请号:US18626959
申请日:2024-04-04
Applicant: Datum Point Labs Inc.
Inventor: Gwantae Kim , Hanseok Ko
IPC: G06N3/08 , G06N3/0455
CPC classification number: G06N3/08 , G06N3/0455
Abstract: Embodiments described herein provide systems and methods for gesture generation from multimodal input. A method includes receiving a multimodal input. The method may further include masking a subset of the multimodal input; generating, via an embedder, a multimodal embedding based on the masked multimodal input; generating, via an encoder, multimodal features based on the multimodal embedding, wherein the encoder includes one or more attention layers connecting different modalities; generating, via a generator, multimodal output based on the multimodal features; computing a loss based on the multimodal input and the multimodal output. The method may further include updating parameters of the encoder based on the loss.
-
公开(公告)号:US20250166602A1
公开(公告)日:2025-05-22
申请号:US18953970
申请日:2024-11-20
Applicant: Datum Point Labs Inc.
Inventor: Kyungseok Oh , Hanseok Ko , Bonhwa Ku
IPC: G10L13/027 , G10L13/033
Abstract: Embodiments described herein include voice conversion (VC) based Emotion Data generation. Embodiments described herein may generate a multi-speaker multi-emotion dataset by changing the gender style of the input speech while retaining its emotion style and linguistic contents. For example, a single-speaker multi-emotion dataset may be used as the input speech and a multi-speaker single emotion dataset may be the target speech. The generated data may be used as the training data for a text to speech (TTS) model so that it can generate speeches with diverse styles of emotions and speakers. To generate a multi-speaker multi-emotion dataset, embodiments herein add an emotion encoder to a VC model and use acoustic properties to preserve the emotion speech style of the input speech while changing just the gender style to the target gender style.
-
-
-
-
-
-
-
-
-