ENHANCED USER EXPERIENCE THROUGH BI-DIRECTIONAL AUDIO AND VISUAL SIGNAL GENERATION

Invention Publication

US20240054683A1 ENHANCED USER EXPERIENCE THROUGH BI-DIRECTIONAL AUDIO AND VISUAL SIGNAL GENERATION 审中-公开

Please log in to see more content

Patent Title: ENHANCED USER EXPERIENCE THROUGH BI-DIRECTIONAL AUDIO AND VISUAL SIGNAL GENERATION
Application No.: US18383956

Application Date: 2023-10-26
Publication No.: US20240054683A1

Publication Date: 2024-02-15
Inventor: Sunando SENGUPTA , Alexandros NEOFYTOU , Eric Chris Wolfgang SOMMERLADE , Yang LIU
Applicant: Microsoft Technology Licensing, LLC
Applicant Address: US WA Redmond
Assignee: Microsoft Technology Licensing, LLC
Current Assignee: Microsoft Technology Licensing, LLC
Current Assignee Address: US WA Redmond
Main IPC: G06T9/00
IPC: G06T9/00 ; G06T3/60 ; G10L19/012 ; G10L25/51 ; G06F18/21

ENHANCED USER EXPERIENCE THROUGH BI-DIRECTIONAL AUDIO AND VISUAL SIGNAL GENERATION

Abstract:

In various embodiments, a computer-implemented method of training a neural network for creating an output signal of different modality from an input signal is described. In embodiments, the first modality may be a sound signal or a visual image and where the output signal would be a visual image or a sound signal, respectively. In embodiments a model is trained using a first pair of visual and audio networks to train a set of codebooks using known visual signals and the audio signals and using a second pair of visual and audio networks to further train the set of codebooks using the augmented visual signals and the augmented audio signals. Further, the first and the second visual networks are equally weighted and where the first and the second audio networks are equally weighted.

Public/Granted literature

US12288366B2 Enhanced user experience through bi-directional audio and visual signal generation Public/Granted day:2025-04-29

Information query

Global Dossier Espacenet

IPC分类:

G	物理
G06	计算；推算或计数
G06T	一般的图像数据处理或产生
G06T9/00	图像编码（静态图像的带宽或冗余减少的压缩入H04N 1/41；静态彩色图像信号的编码或解码入H04N 1/64；数字视频信号的编码、解码、压缩或解压缩的方法或装置入H04N 19/00）