SPEECH SIGNAL ENHANCEMENT USING VISUAL INFORMATION

发明公开

EP2766901A1 SPEECH SIGNAL ENHANCEMENT USING VISUAL INFORMATION 有权

标题翻译： SPRACHSIGVERVERSTÄRKUNGMIT VISUELLEN INFORMATIONEN

请登陆查看更多内容

专利标题： SPEECH SIGNAL ENHANCEMENT USING VISUAL INFORMATION
专利标题（中）： SPRACHSIGVERVERSTÄRKUNGMIT VISUELLEN INFORMATIONEN
申请号： EP11782489.6

申请日： 2011-10-17
公开(公告)号： EP2766901A1

公开(公告)日： 2014-08-20
发明人: HERBIG, Tobias , WOLFF, Tobias , BUCK, Markus
申请人： Nuance Communications, Inc.
申请人地址： 1 Wayside Road Suite 100 Burlington, MA 01803-4613 US
专利权人： Nuance Communications, Inc.
当前专利权人： Nuance Communications, Inc.
当前专利权人地址： 1 Wayside Road Suite 100 Burlington, MA 01803-4613 US
代理机构： South, Nicholas Geoffrey
国际公布： WO2013058728 20130425
主分类号： G10L21/02
IPC分类号： G10L21/02 ; H04N7/15 ; H04N1/40 ; H04M3/56 ; H04R3/00 ; H04R3/04

SPEECH SIGNAL ENHANCEMENT USING VISUAL INFORMATION

摘要：

Visual information is used to alter or set an operating parameter of an audio signal processor, other than a beamformer. A digital camera captures visual information about a scene that includes a human speaker and/or a listener. The visual information is analyzed to ascertain information about acoustics of a room. A distance between the speaker and a microphone may be estimated, and this distance estimate may be used to adjust an overall gain of the system. Distances among, and locations of, the speaker, the listener, the microphone, a loudspeaker and/or a sound- reflecting surface may be estimated. These estimates may be used to estimate reverberations within the room and adjust aggressiveness of an anti-reverberation filter, based on an estimated ratio of direct to indirect (reverberated) sound energy expected to reach the microphone. In addition, orientation of the speaker or the listener, relative to the microphone or the loudspeaker, can also be estimated, and this estimate may be used to adjust frequency-dependent filter weights to compensate for uneven frequency propagation of acoustic signals from a mouth, or to a human ear, about a human head.

摘要（中）：

视觉信息用于改变或设置除波束形成器之外的音频信号处理器的操作参数。数码相机捕获包含人类扬声器和/或听众的场景的视觉信息。分析视觉信息以确定房间的声学信息。可以估计扬声器和麦克风之间的距离，并且该距离估计可以用于调整系统的整体增益。可以估计扬声器，收听者，麦克风，扬声器和/或声音反射表面之间的距离和位置。这些估计可以用于估计房间内的混响，并且基于估计达到麦克风的直接到间接（混响）声能的估计比例来调整反混响滤波器的积极性。此外，也可以估计扬声器或收听者相对于麦克风或扬声器的取向，并且该估计可用于调整频率依赖的滤波器权重以补偿来自口的声信号的不均匀频率传播，或人耳，围绕人的头部。

公开/授权文献

EP2766901B1 SPEECH SIGNAL ENHANCEMENT USING VISUAL INFORMATION 公开/授权日：2016-09-21

信息查询

Global Dossier Espacenet

IPC分类:

G	物理
G10	乐器；声学
G10L	语音分析或合成；语音识别；语音或声音处理；语音或音频编码或解码
G10L21/00	为了改变语音或声音信号的质量或其可识度而处理语音或声音信号，以产生另一种可听的或非可听的信号，例如视觉信号或触觉信号（G10L19/00优先）
G10L21/02	.语音增强，例如降低噪声或消除回声（在直线传送系统中减轻回声效应入H04B3/20；免提电话中的回声抑制入H04M9/08）