SYSTEMS AND METHODS FOR PHONEME AND VISEME RECOGNITION

Invention Application

WO2021257316A1 SYSTEMS AND METHODS FOR PHONEME AND VISEME RECOGNITION 审中-公开

Please log in to see more content

Patent Title: SYSTEMS AND METHODS FOR PHONEME AND VISEME RECOGNITION
Application No.: PCT/US2021/036268

Application Date: 2021-06-07
Publication No.: WO2021257316A1

Publication Date: 2021-12-23
Inventor: WANG, Yadong , RAO, Shilpa Jois , PARTHASARATHI, Murthy
Applicant: NETFLIX, INC.
Applicant Address: 100 Winchester Circle
Assignee: NETFLIX, INC.
Current Assignee: NETFLIX, INC.
Current Assignee Address: 100 Winchester Circle
Agency: HANKS, Bryan K.
Priority: US16/903,373 2020-06-16
Main IPC: G10L21/10
IPC: G10L21/10 ; G06N20/00 ; G10L15/02 ; G10L15/04 ; G10L15/08 ; G10L15/24 ; G10L2015/025 ; G10L2021/105 ; G10L21/0232

SYSTEMS AND METHODS FOR PHONEME AND VISEME RECOGNITION

Abstract:

The disclosed computer-implemented method may include training a machine-learning algorithm to use look-ahead to improve effectiveness of identifying visemes corresponding to audio signals by, for one or more audio segments in a set of training audio signals, evaluating an audio segment, where the audio segment includes at least a portion of a phoneme, and a subsequent segment that includes contextual audio that comes after the audio segment and potentially contains context about a viseme that maps to the phoneme. The method may also include using the trained machine-learning algorithm to identify one or more probable visemes corresponding to speech in a target audio signal. Additionally, the method may include recording, as metadata of the target audio signal, where a probable viseme occurs within the target audio signal. Various other methods, systems, and computer-readable media are also disclosed.

Information query

Global Dossier Patent Scope Espacenet

IPC分类:

G	物理
G10	乐器；声学
G10L	语音分析或合成；语音识别；语音或声音处理；语音或音频编码或解码
G10L21/00	为了改变语音或声音信号的质量或其可识度而处理语音或声音信号，以产生另一种可听的或非可听的信号，例如视觉信号或触觉信号（G10L19/00优先）
G10L21/06	.将语音转换成非可听表达形式，例如语音可视化、触觉辅助的语音处理（G10L15/26优先）
G10L21/10	..转换成可视信息