Self-supervised AI-assisted sound effect generation for silent video using multimodal clustering

Invention Grant

US11615312B2 Self-supervised AI-assisted sound effect generation for silent video using multimodal clustering 有权

Please log in to see more content

Patent Title: Self-supervised AI-assisted sound effect generation for silent video using multimodal clustering
Application No.: US16848499

Application Date: 2020-04-14
Publication No.: US11615312B2

Publication Date: 2023-03-28
Inventor: Sudha Krishnamurthy
Applicant: Sony Interactive Entertainment Inc.
Applicant Address: JP Tokyo
Assignee: Sony Interactive Entertainment Inc.
Current Assignee: Sony Interactive Entertainment Inc.
Current Assignee Address: JP Tokyo
Agency: JDI Patent
Agent Joshua D. Isenberg; Robert Pullman
Main IPC: G06F16/68
IPC: G06F16/68 ; G06N20/00 ; G10L15/16 ; G06N3/08 ; G06N3/04 ; G06N3/084

Self-supervised AI-assisted sound effect generation for silent video using multimodal clustering

Abstract:

An automated method, system, and computer readable medium for generating sound effect recommendations for visual input by training machine learning models that learn audio-visual correlations from a reference image or video, a positive audio signal, and a negative audio signal. A machine learning algorithm is used with a reference visual input, a positive audio signal input or a negative audio signal input to train a multimodal clustering neural network to output representations for the visual input and audio input as well as correlation scores between the audio and visual representations. The trained multimodal clustering neural network is configured to learn representations in such a way that the visual representation and positive audio representation have higher correlation scores than the visual representation and a negative audio representation or an unrelated audio representation.

Public/Granted literature

US20210319322A1 SELF-SUPERVISED AI-ASSISTED SOUND EFFECT GENERATION FOR SILENT VIDEO USING MULTIMODAL CLUSTERING Public/Granted day:2021-10-14

Information query

Espacenet

IPC分类:

G	物理
G06	计算；推算或计数
G06F	电数字数据处理（基于特定计算模型的计算机系统入G06N）
G06F16/00	信息检索；数据库结构；文件系统结构
G06F16/60	.•音频数据
G06F16/68	..••使用元数据的特征检索,例如,不来自内容或者元数据派生的