Invention Publication
- Patent Title: ENHANCED USER EXPERIENCE THROUGH BI-DIRECTIONAL AUDIO AND VISUAL SIGNAL GENERATION
-
Application No.: US18383956Application Date: 2023-10-26
-
Publication No.: US20240054683A1Publication Date: 2024-02-15
- Inventor: Sunando SENGUPTA , Alexandros NEOFYTOU , Eric Chris Wolfgang SOMMERLADE , Yang LIU
- Applicant: Microsoft Technology Licensing, LLC
- Applicant Address: US WA Redmond
- Assignee: Microsoft Technology Licensing, LLC
- Current Assignee: Microsoft Technology Licensing, LLC
- Current Assignee Address: US WA Redmond
- Main IPC: G06T9/00
- IPC: G06T9/00 ; G06T3/60 ; G10L19/012 ; G10L25/51 ; G06F18/21

Abstract:
In various embodiments, a computer-implemented method of training a neural network for creating an output signal of different modality from an input signal is described. In embodiments, the first modality may be a sound signal or a visual image and where the output signal would be a visual image or a sound signal, respectively. In embodiments a model is trained using a first pair of visual and audio networks to train a set of codebooks using known visual signals and the audio signals and using a second pair of visual and audio networks to further train the set of codebooks using the augmented visual signals and the augmented audio signals. Further, the first and the second visual networks are equally weighted and where the first and the second audio networks are equally weighted.
Public/Granted literature
- US12288366B2 Enhanced user experience through bi-directional audio and visual signal generation Public/Granted day:2025-04-29
Information query