System and method for audio/video speaker detection

Invention Grant

US07343289B2 System and method for audio/video speaker detection 有权

Title translation: 用于音频/视频扬声器检测的系统和方法

Please log in to see more content

Patent Title: System and method for audio/video speaker detection
Patent Title (中): 用于音频/视频扬声器检测的系统和方法
Application No.: US10606061

Application Date: 2003-06-25
Publication No.: US07343289B2

Publication Date: 2008-03-11
Inventor: Ross Cutler , Ashish Kapoor
Applicant: Ross Cutler , Ashish Kapoor
Applicant Address: US WA Redmond
Assignee: Microsoft Corp.
Current Assignee: Microsoft Corp.
Current Assignee Address: US WA Redmond
Agency: Lyon & Harr, LLP
Agent Katrina A. Lyon
Main IPC: G10L13/00
IPC: G10L13/00

System and method for audio/video speaker detection

Abstract:

A system and method for detecting speech utilizing audio and video inputs. In one aspect, the invention collects audio data generated from a microphone device. In another aspect, the invention collects video data and processes the data to determine a mouth location for a given speaker. The audio and video are inputted into a time-delay neural network that processes the data to determine which target is speaking. The neural network processing is based upon a correlation to detected mouth movement from the video data and audio sounds detected by the microphone.

Abstract(Chinese):

一种利用音频和视频输入来检测语音的系统和方法。一方面，本发明收集从麦克风装置产生的音频数据。在另一方面，本发明收集视频数据并处理数据以确定给定说话者的嘴部位置。音频和视频被输入到时间延迟神经网络中，处理数据以确定哪个目标在说话。神经网络处理基于与从视频数据检测到的嘴部移动和由麦克风检测到的音频声音的相关性。

Public/Granted literature

US20040267521A1 System and method for audio/video speaker detection Public/Granted day:2004-12-30

Information query

Espacenet

IPC分类:

G	物理
G10	乐器；声学
G10L	语音分析或合成；语音识别；语音或声音处理；语音或音频编码或解码
G10L13/00	语音合成；文本-语音合成系统