Method for likelihood computation in multi-stream HMM based speech recognition
    1.
    发明授权
    Method for likelihood computation in multi-stream HMM based speech recognition 有权
    基于多流HMM语音识别的似然计算方法

    公开(公告)号:US07480617B2

    公开(公告)日:2009-01-20

    申请号:US10946381

    申请日:2004-09-21

    IPC分类号: G10L15/14

    CPC分类号: G10L15/144

    摘要: A method for speech recognition includes determining active Gaussians related to a first feature stream and a second feature stream by labeling at least one of the first and second streams, and determining active Gaussians co-occurring in the first stream and the second stream based upon joint probability. A number of Gaussians computed is reduced based upon Gaussians already computed for the first stream and a number of Gaussians co-occurring in the second stream. Speech is decoded based on the Gaussians computed for the first and second streams.

    摘要翻译: 一种用于语音识别的方法包括:通过标记第一和第二流中的至少一个来确定与第一特征流和第二特征流相关的有效高斯,以及基于联合来确定在第一流和第二流中共同存在的主动高斯 可能性。 基于已经为第一个流计算的高斯和在第二个流中共同出现的高斯数,减少了计算出的高斯数。 基于为第一和第二流计算的高斯解码语音。

    Audio-only backoff in audio-visual speech recognition system
    3.
    发明授权
    Audio-only backoff in audio-visual speech recognition system 有权
    音视频语音识别系统中的音频回退

    公开(公告)号:US07251603B2

    公开(公告)日:2007-07-31

    申请号:US10601350

    申请日:2003-06-23

    IPC分类号: G10L21/00

    CPC分类号: G10L15/25

    摘要: Techniques for performing audio-visual speech recognition, with improved recognition performance, in a degraded visual environment. For example, in one aspect of the invention, a technique for use in accordance with an audio-visual speech recognition system for improving a recognition performance thereof includes the steps/operations of: (i) selecting between an acoustic-only data model and an acoustic-visual data model based on a condition associated with a visual environment; and (ii) decoding at least a portion of an input spoken utterance using the selected data model. Advantageously, during periods of degraded visual conditions, the audio-visual speech recognition system is able to decode (recognize) input speech data using audio-only data, thus avoiding recognition inaccuracies that may result from performing speech recognition based on acoustic-visual data models and degraded visual data.

    摘要翻译: 在劣化的视觉环境中执行视听语音识别技术,具有改进的识别性能。 例如,在本发明的一个方面,根据用于改善其识别性能的视听语音识别系统使用的技术包括以下步骤/操作:(i)在仅声学数据模型和 基于与视觉环境相关的条件的声学可视数据模型; 以及(ii)使用所选择的数据模型解码输入口头发音的至少一部分。 有利的是,在恶化的视觉条件期间,视听语音识别系统能够使用仅音频数据解码(识别)输入语音数据,从而避免了基于声学可视数据模型执行语音识别可能导致的识别不准确 并降低视觉数据。

    Automated decision making using time-varying stream reliability prediction
    4.
    发明授权
    Automated decision making using time-varying stream reliability prediction 失效
    使用时变流可靠性预测的自动决策

    公开(公告)号:US07228279B2

    公开(公告)日:2007-06-05

    申请号:US10397762

    申请日:2003-03-26

    IPC分类号: G10L17/00

    CPC分类号: G10L17/06 G10L17/20

    摘要: Automated decision making techniques are provided. For example, a technique for generating a decision associated with an individual or an entity includes the following steps. First, two or more data streams associated with the individual or the entity are captured. Then, at least one time-varying measure is computed in accordance with the two or more data streams. Lastly, a decision is computed based on the at least one time-varying measure. One form of the time-varying measure may include a measure of the coverage of a model associated with previously-obtained training data by at least a portion of the captured data. Another form of the time-varying measure may include a measure of the stability of at least a portion of the captured data. While either measure may be employed alone to compute a decision, preferably both the coverage and stability measures are employed. The technique may be used to authenticate a speaker.

    摘要翻译: 提供自动决策技术。 例如,用于生成与个体或实体相关联的决定的技术包括以下步骤。 首先,捕获与个体或实体相关联的两个或多个数据流。 然后,根据两个或多个数据流来计算至少一个时变度量。 最后,基于至少一个时变度量来计算决定。 时变测量的一种形式可以包括通过所捕获的数据的至少一部分与先前获得的训练数据相关联的模型的覆盖度的度量。 时变措施的另一种形式可以包括所捕获的数据的至少一部分的稳定性的度量。 尽管可以单独使用任一种方法来计算决策,但优选采用覆盖和稳定性度量。 该技术可用于认证扬声器。

    System and method for likelihood computation in multi-stream HMM based speech recognition
    5.
    发明申请
    System and method for likelihood computation in multi-stream HMM based speech recognition 有权
    用于基于多流HMM的语音识别中的似然计算的系统和方法

    公开(公告)号:US20060074654A1

    公开(公告)日:2006-04-06

    申请号:US10946381

    申请日:2004-09-21

    IPC分类号: G10L15/08

    CPC分类号: G10L15/144

    摘要: A system and method for speech recognition includes determining active Gaussians related to a first feature stream and a second feature stream by labeling at least one of the first and second streams, and determining active Gaussians co-occurring in the first stream and the second stream based upon joint probability. A number of Gaussians computed is reduced based upon Gaussians already computed for the first stream and a number of Gaussians co-occurring in the second stream. Speech is decoded based on the Gaussians computed for the first and second streams.

    摘要翻译: 用于语音识别的系统和方法包括:通过标记第一和第二流中的至少一个来确定与第一特征流和第二特征流相关的活动高斯,以及确定在第一流和第二流中共存的活动高斯 联合概率。 基于已经为第一个流计算的高斯和在第二个流中共同出现的高斯数,减少了计算出的高斯数。 基于为第一和第二流计算的高斯解码语音。

    Audio-visual selection process for the synthesis of photo-realistic talking-head animations
    6.
    发明申请
    Audio-visual selection process for the synthesis of photo-realistic talking-head animations 有权
    视听选择过程,用于合成照片真实的讲话头动画

    公开(公告)号:US20050057570A1

    公开(公告)日:2005-03-17

    申请号:US10662550

    申请日:2003-09-15

    IPC分类号: G06T15/70 G10L15/02 G10L21/06

    摘要: A system and method for generating photo-realistic talking-head animation from a text input utilizes an audio-visual unit selection process. The lip-synchronization is obtained by optimally selecting and concatenating variable-length video units of the mouth area. The unit selection process utilizes the acoustic data to determine the target costs for the candidate images and utilizes the visual data to determine the concatenation costs. The image database is prepared in a hierarchical fashion, including high-level features (such as a full 3D modeling of the head, geometric size and position of elements) and pixel-based, low-level features (such as a PCA-based metric for labeling the various feature bitmaps).

    摘要翻译: 用于从文本输入生成照片真实的讲话头动画的系统和方法利用视听单元选择过程。 通过最佳地选择并连接口区的可变长度视频单元来获得唇同步。 单元选择过程利用声学数据来确定候选图像的目标成本,并利用视觉数据来确定连接成本。 以分层方式准备图像数据库,包括高级特征(例如头部的完整3D建模,元件的几何尺寸和位置)和基于像素的低级特征(例如基于PCA的度量 用于标记各种功能位图)。

    Audio-visual selection process for the synthesis of photo-realistic talking-head animations
    7.
    发明授权
    Audio-visual selection process for the synthesis of photo-realistic talking-head animations 失效
    视听选择过程,用于合成照片真实的讲话头动画

    公开(公告)号:US06654018B1

    公开(公告)日:2003-11-25

    申请号:US09820396

    申请日:2001-03-29

    IPC分类号: G06T1300

    CPC分类号: G10L13/08 G10L2021/105

    摘要: A system and method for generating photo-realistic talking-head animation from a text input utilizes an audio-visual unit selection process. The lip-synchronization is obtained by optimally selecting and concatenating variable-length video units of the mouth area. The unit selection process utilizes the acoustic data to determine the target costs for the candidate images and utilizes the visual data to determine the concatenation costs. The image database is prepared in a hierarchical fashion, including high-level features (such as a full 3D modeling of the head, geometric size and position of elements) and pixel-based, low-level features (such as a PCA-based metric for labeling the various feature bitmaps).

    摘要翻译: 用于从文本输入生成照片真实的讲话头动画的系统和方法利用视听单元选择过程。 通过最佳地选择并连接口区的可变长度视频单元来获得唇同步。 单元选择过程利用声学数据来确定候选图像的目标成本,并利用视觉数据来确定连接成本。 以分层方式准备图像数据库,包括高级特征(例如头部的完整3D建模,元件的几何尺寸和位置)和基于像素的低级特征(例如基于PCA的度量 用于标记各种功能位图)。

    Robust multi-modal method for recognizing objects
    8.
    发明授权
    Robust multi-modal method for recognizing objects 失效
    用于识别对象的鲁棒多模态方法

    公开(公告)号:US6118887A

    公开(公告)日:2000-09-12

    申请号:US948750

    申请日:1997-10-10

    IPC分类号: G06K9/00 G06T7/20

    CPC分类号: G06K9/00228 G06T7/2033

    摘要: A method for tracking heads and faces is disclosed wherein a variety of different representation models can be used to define individual heads and facial features in a multi-channel capable tracking algorithm. The representation models generated by the channels during a sequence of frames are ultimately combined into a representation comprising a highly robust and accurate tracked output. In a preferred embodiment, the method conducts an initial overview procedure to establish the optimal tracking strategy to be used in light of the particular characteristics of the tracking application.

    摘要翻译: 公开了用于跟踪头部和面部的方法,其中可以使用各种不同的表示模型来在多通道能力跟踪算法中定义单独的头部和面部特征。 在帧序列期间由信道产生的表示模型最终被组合成包括高度鲁棒且准确的跟踪输出的表示。 在优选实施例中,该方法进行初始概览过程以根据跟踪应用的特定特征建立要使用的最佳跟踪策略。

    System and method for likelihood computation in multi-stream HMM based speech recognition
    9.
    发明授权
    System and method for likelihood computation in multi-stream HMM based speech recognition 有权
    用于基于多流HMM的语音识别中的似然计算的系统和方法

    公开(公告)号:US08121840B2

    公开(公告)日:2012-02-21

    申请号:US12131190

    申请日:2008-06-02

    IPC分类号: G10L15/14

    CPC分类号: G10L15/144

    摘要: A system and method for speech recognition includes determining active Gaussians related to a first feature stream and a second feature stream by labeling at least one of the first and second streams, and determining active Gaussians co-occurring in the first stream and the second stream based upon joint probability. A number of Gaussians computed is reduced based upon Gaussians already computed for the first stream and a number of Gaussians co-occurring in the second stream. Speech is decoded based on the Gaussians computed for the first and second streams.

    摘要翻译: 用于语音识别的系统和方法包括:通过标记第一和第二流中的至少一个来确定与第一特征流和第二特征流相关的活动高斯,以及确定在第一流和第二流中共存的活动高斯 联合概率。 基于已经为第一个流计算的高斯和在第二个流中共同出现的高斯数,减少了计算出的高斯数。 基于为第一和第二流计算的高斯解码语音。

    METHOD AND APPARATUS FOR PERVASIVE AUTHENTICATION DOMAINS

    公开(公告)号:US20080141357A1

    公开(公告)日:2008-06-12

    申请号:US11932918

    申请日:2007-10-31

    IPC分类号: H04L9/32

    摘要: Methods and apparatus for enabling a Pervasive Authentication Domain. A Pervasive Authentication Domain allows many registered Pervasive Devices to obtain authentication credentials from a single Personal Authentication Gateway and to use these credentials on behalf of users to enable additional capabilities for the devices. It provides an arrangement for a user to store credentials in one device (the Personal Authentication Gateway), and then make use of those credentials from many authorized Pervasive Devices without re-entering the credentials. It provides a convenient way for a user to share credentials among many devices, particularly when it is not convenient to enter credentials as in a smart wristwatch environment. It further provides an arrangement for disabling access to credentials to devices that appear to be far from the Personal Authentication Gateway as measured by metrics such as communications signal strengths.