Audio-visual selection process for the synthesis of photo-realistic talking-head animations
    1.
    发明授权
    Audio-visual selection process for the synthesis of photo-realistic talking-head animations 失效
    视听选择过程,用于合成照片真实的讲话头动画

    公开(公告)号:US06654018B1

    公开(公告)日:2003-11-25

    申请号:US09820396

    申请日:2001-03-29

    IPC分类号: G06T1300

    CPC分类号: G10L13/08 G10L2021/105

    摘要: A system and method for generating photo-realistic talking-head animation from a text input utilizes an audio-visual unit selection process. The lip-synchronization is obtained by optimally selecting and concatenating variable-length video units of the mouth area. The unit selection process utilizes the acoustic data to determine the target costs for the candidate images and utilizes the visual data to determine the concatenation costs. The image database is prepared in a hierarchical fashion, including high-level features (such as a full 3D modeling of the head, geometric size and position of elements) and pixel-based, low-level features (such as a PCA-based metric for labeling the various feature bitmaps).

    摘要翻译: 用于从文本输入生成照片真实的讲话头动画的系统和方法利用视听单元选择过程。 通过最佳地选择并连接口区的可变长度视频单元来获得唇同步。 单元选择过程利用声学数据来确定候选图像的目标成本,并利用视觉数据来确定连接成本。 以分层方式准备图像数据库,包括高级特征(例如头部的完整3D建模,元件的几何尺寸和位置)和基于像素的低级特征(例如基于PCA的度量 用于标记各种功能位图)。

    Robust multi-modal method for recognizing objects
    2.
    发明授权
    Robust multi-modal method for recognizing objects 失效
    用于识别对象的鲁棒多模态方法

    公开(公告)号:US6118887A

    公开(公告)日:2000-09-12

    申请号:US948750

    申请日:1997-10-10

    IPC分类号: G06K9/00 G06T7/20

    CPC分类号: G06K9/00228 G06T7/2033

    摘要: A method for tracking heads and faces is disclosed wherein a variety of different representation models can be used to define individual heads and facial features in a multi-channel capable tracking algorithm. The representation models generated by the channels during a sequence of frames are ultimately combined into a representation comprising a highly robust and accurate tracked output. In a preferred embodiment, the method conducts an initial overview procedure to establish the optimal tracking strategy to be used in light of the particular characteristics of the tracking application.

    摘要翻译: 公开了用于跟踪头部和面部的方法,其中可以使用各种不同的表示模型来在多通道能力跟踪算法中定义单独的头部和面部特征。 在帧序列期间由信道产生的表示模型最终被组合成包括高度鲁棒且准确的跟踪输出的表示。 在优选实施例中,该方法进行初始概览过程以根据跟踪应用的特定特征建立要使用的最佳跟踪策略。

    Audio-visual selection process for the synthesis of photo-realistic talking-head animations
    3.
    发明授权
    Audio-visual selection process for the synthesis of photo-realistic talking-head animations 有权
    视听选择过程,用于合成照片真实的讲话头动画

    公开(公告)号:US07990384B2

    公开(公告)日:2011-08-02

    申请号:US10662550

    申请日:2003-09-15

    IPC分类号: G06T13/00

    摘要: A system and method for generating photo-realistic talking-head animation from a text input utilizes an audio-visual unit selection process. The lip-synchronization is obtained by optimally selecting and concatenating variable-length video units of the mouth area. The unit selection process utilizes the acoustic data to determine the target costs for the candidate images and utilizes the visual data to determine the concatenation costs. The image database is prepared in a hierarchical fashion, including high-level features (such as a full 3D modeling of the head, geometric size and position of elements) and pixel-based, low-level features (such as a PCA-based metric for labeling the various feature bitmaps).

    摘要翻译: 用于从文本输入生成照片真实的讲话头动画的系统和方法利用视听单元选择过程。 通过最佳地选择并连接口区的可变长度视频单元来获得唇同步。 单元选择过程利用声学数据来确定候选图像的目标成本,并利用视觉数据来确定连接成本。 以分层方式准备图像数据库,包括高级特征(例如头部的完整3D建模,元件的几何尺寸和位置)和基于像素的低级特征(例如基于PCA的度量 用于标记各种功能位图)。

    Audio-visual selection process for the synthesis of photo-realistic talking-head animations
    4.
    发明申请
    Audio-visual selection process for the synthesis of photo-realistic talking-head animations 有权
    视听选择过程,用于合成照片真实的讲话头动画

    公开(公告)号:US20050057570A1

    公开(公告)日:2005-03-17

    申请号:US10662550

    申请日:2003-09-15

    IPC分类号: G06T15/70 G10L15/02 G10L21/06

    摘要: A system and method for generating photo-realistic talking-head animation from a text input utilizes an audio-visual unit selection process. The lip-synchronization is obtained by optimally selecting and concatenating variable-length video units of the mouth area. The unit selection process utilizes the acoustic data to determine the target costs for the candidate images and utilizes the visual data to determine the concatenation costs. The image database is prepared in a hierarchical fashion, including high-level features (such as a full 3D modeling of the head, geometric size and position of elements) and pixel-based, low-level features (such as a PCA-based metric for labeling the various feature bitmaps).

    摘要翻译: 用于从文本输入生成照片真实的讲话头动画的系统和方法利用视听单元选择过程。 通过最佳地选择并连接口区的可变长度视频单元来获得唇同步。 单元选择过程利用声学数据来确定候选图像的目标成本,并利用视觉数据来确定连接成本。 以分层方式准备图像数据库,包括高级特征(例如头部的完整3D建模,元件的几何尺寸和位置)和基于像素的低级特征(例如基于PCA的度量 用于标记各种功能位图)。

    Method for likelihood computation in multi-stream HMM based speech recognition
    5.
    发明授权
    Method for likelihood computation in multi-stream HMM based speech recognition 有权
    基于多流HMM语音识别的似然计算方法

    公开(公告)号:US07480617B2

    公开(公告)日:2009-01-20

    申请号:US10946381

    申请日:2004-09-21

    IPC分类号: G10L15/14

    CPC分类号: G10L15/144

    摘要: A method for speech recognition includes determining active Gaussians related to a first feature stream and a second feature stream by labeling at least one of the first and second streams, and determining active Gaussians co-occurring in the first stream and the second stream based upon joint probability. A number of Gaussians computed is reduced based upon Gaussians already computed for the first stream and a number of Gaussians co-occurring in the second stream. Speech is decoded based on the Gaussians computed for the first and second streams.

    摘要翻译: 一种用于语音识别的方法包括:通过标记第一和第二流中的至少一个来确定与第一特征流和第二特征流相关的有效高斯,以及基于联合来确定在第一流和第二流中共同存在的主动高斯 可能性。 基于已经为第一个流计算的高斯和在第二个流中共同出现的高斯数,减少了计算出的高斯数。 基于为第一和第二流计算的高斯解码语音。

    Audio-only backoff in audio-visual speech recognition system
    7.
    发明授权
    Audio-only backoff in audio-visual speech recognition system 有权
    音视频语音识别系统中的音频回退

    公开(公告)号:US07251603B2

    公开(公告)日:2007-07-31

    申请号:US10601350

    申请日:2003-06-23

    IPC分类号: G10L21/00

    CPC分类号: G10L15/25

    摘要: Techniques for performing audio-visual speech recognition, with improved recognition performance, in a degraded visual environment. For example, in one aspect of the invention, a technique for use in accordance with an audio-visual speech recognition system for improving a recognition performance thereof includes the steps/operations of: (i) selecting between an acoustic-only data model and an acoustic-visual data model based on a condition associated with a visual environment; and (ii) decoding at least a portion of an input spoken utterance using the selected data model. Advantageously, during periods of degraded visual conditions, the audio-visual speech recognition system is able to decode (recognize) input speech data using audio-only data, thus avoiding recognition inaccuracies that may result from performing speech recognition based on acoustic-visual data models and degraded visual data.

    摘要翻译: 在劣化的视觉环境中执行视听语音识别技术,具有改进的识别性能。 例如,在本发明的一个方面,根据用于改善其识别性能的视听语音识别系统使用的技术包括以下步骤/操作:(i)在仅声学数据模型和 基于与视觉环境相关的条件的声学可视数据模型; 以及(ii)使用所选择的数据模型解码输入口头发音的至少一部分。 有利的是,在恶化的视觉条件期间,视听语音识别系统能够使用仅音频数据解码(识别)输入语音数据,从而避免了基于声学可视数据模型执行语音识别可能导致的识别不准确 并降低视觉数据。

    System and method for likelihood computation in multi-stream HMM based speech recognition
    8.
    发明授权
    System and method for likelihood computation in multi-stream HMM based speech recognition 有权
    用于基于多流HMM的语音识别中的似然计算的系统和方法

    公开(公告)号:US08121840B2

    公开(公告)日:2012-02-21

    申请号:US12131190

    申请日:2008-06-02

    IPC分类号: G10L15/14

    CPC分类号: G10L15/144

    摘要: A system and method for speech recognition includes determining active Gaussians related to a first feature stream and a second feature stream by labeling at least one of the first and second streams, and determining active Gaussians co-occurring in the first stream and the second stream based upon joint probability. A number of Gaussians computed is reduced based upon Gaussians already computed for the first stream and a number of Gaussians co-occurring in the second stream. Speech is decoded based on the Gaussians computed for the first and second streams.

    摘要翻译: 用于语音识别的系统和方法包括:通过标记第一和第二流中的至少一个来确定与第一特征流和第二特征流相关的活动高斯,以及确定在第一流和第二流中共存的活动高斯 联合概率。 基于已经为第一个流计算的高斯和在第二个流中共同出现的高斯数,减少了计算出的高斯数。 基于为第一和第二流计算的高斯解码语音。

    METHOD AND APPARATUS FOR PERVASIVE AUTHENTICATION DOMAINS

    公开(公告)号:US20080141357A1

    公开(公告)日:2008-06-12

    申请号:US11932918

    申请日:2007-10-31

    IPC分类号: H04L9/32

    摘要: Methods and apparatus for enabling a Pervasive Authentication Domain. A Pervasive Authentication Domain allows many registered Pervasive Devices to obtain authentication credentials from a single Personal Authentication Gateway and to use these credentials on behalf of users to enable additional capabilities for the devices. It provides an arrangement for a user to store credentials in one device (the Personal Authentication Gateway), and then make use of those credentials from many authorized Pervasive Devices without re-entering the credentials. It provides a convenient way for a user to share credentials among many devices, particularly when it is not convenient to enter credentials as in a smart wristwatch environment. It further provides an arrangement for disabling access to credentials to devices that appear to be far from the Personal Authentication Gateway as measured by metrics such as communications signal strengths.

    System and method for microphone activation using visual speech cues
    10.
    发明授权
    System and method for microphone activation using visual speech cues 失效
    使用视觉语音提示的麦克风激活的系统和方法

    公开(公告)号:US06754373B1

    公开(公告)日:2004-06-22

    申请号:US09616229

    申请日:2000-07-14

    IPC分类号: G06K900

    摘要: A system for activating a microphone based on visual speech cues, in accordance with the invention, includes a feature tracker coupled to an image acquisition device. The feature tracker tracks features in an image of a user. A region of interest extractor is coupled to the feature tracker. The region of interest extractor extracts a region of interest from the image of the user. A visual speech activity detector is coupled to the region of interest extractor and measures changes in the region of interest to determine if a visual speech cue has been generated by the user. A microphone is turned on by the visual speech activity detector when a visual speech cue has been determined by the visual speech activity detector. Methods for activating a microphone based on visual speech cues are also included.

    摘要翻译: 根据本发明的用于基于视觉语音提示来激活麦克风的系统包括耦合到图像采集装置的特征跟踪器。 功能跟踪器跟踪用户图像中的功能。 感兴趣区域提取器耦合到特征跟踪器。 感兴趣区域提取器从用户的图像中提取感兴趣的区域。 视觉语音活动检测器耦合到感兴趣区域提取器,并测量感兴趣区域中的变化,以确定用户是否已经产生视觉语音提示。 当视觉语音活动检测器确定了视觉语音提示时,麦克风由视觉语音活动检测器接通。 还包括基于视觉语音提示激活麦克风的方法。