Adaptive visual speech recognition

Invention Grant

US12211488B2 Adaptive visual speech recognition 有权

Please log in to see more content

Patent Title: Adaptive visual speech recognition
Application No.: US18571553

Application Date: 2022-06-15
Publication No.: US12211488B2

Publication Date: 2025-01-28
Inventor: Ioannis Alexandros Assael , Brendan Shillingford , Joao Ferdinando Gomes de Freitas
Applicant: DeepMind Technologies Limited
Applicant Address: GB London
Assignee: DeepMind Technologies Limited
Current Assignee: DeepMind Technologies Limited
Current Assignee Address: GB London
Agency: Fish & Richardson P.C.
Priority: GR20210100402 20210618
International Application: PCT/EP2022/066419 WO 20220615
International Announcement: WO2022/263570 WO 20221222
Main IPC: G10L25/30
IPC: G10L25/30 ; G10L15/06

Abstract:

Methods, systems, and apparatus, including computer programs encoded on computer storage media, for processing video data using an adaptive visual speech recognition model. One of the methods includes receiving a video that includes a plurality of video frames that depict a first speaker: obtaining a first embedding characterizing the first speaker; and processing a first input comprising (i) the video and (ii) the first embedding using a visual speech recognition neural network having a plurality of parameters, wherein the visual speech recognition neural network is configured to process the video and the first embedding in accordance with trained values of the parameters to generate a speech recognition output that defines a sequence of one or more words being spoken by the first speaker in the video.

Public/Granted literature

US20240265911A1 ADAPTIVE VISUAL SPEECH RECOGNITION Public/Granted day:2024-08-08

Information query

Espacenet

IPC分类:

G	物理
G10	乐器；声学
G10L	语音分析或合成；语音识别；语音或声音处理；语音或音频编码或解码
G10L25/00	不限于组G10L 15/00-G10L 21/00的语言或者声音分析技术(当利用语音检测器来感知一些信号特殊特征的基于半导体的静噪放大器，如无信号时的感知入H03G3/34)
G10L25/27	.以分析方法为特征的
G10L25/30	..利用神经网络