专利检索 ap:("Eric Cosatto" OR "Hans Peter Graf" OR "Gerasimos Potamianos") AND inv:"Gerasimos Potamianos" 第 1 页

1.

发明授权
Audio-visual selection process for the synthesis of photo-realistic talking-head animations 失效
标题翻译：视听选择过程，用于合成照片真实的讲话头动画

公开(公告)号：US06654018B1

公开(公告)日：2003-11-25

申请号：US09820396

申请日：2001-03-29

申请人： Eric Cosatto , Hans Peter Graf , Gerasimos Potamianos , Juergen Schroeter

发明人： Eric Cosatto , Hans Peter Graf , Gerasimos Potamianos , Juergen Schroeter

IPC分类号： G06T1300

CPC分类号： G10L13/08 , G10L2021/105

摘要： A system and method for generating photo-realistic talking-head animation from a text input utilizes an audio-visual unit selection process. The lip-synchronization is obtained by optimally selecting and concatenating variable-length video units of the mouth area. The unit selection process utilizes the acoustic data to determine the target costs for the candidate images and utilizes the visual data to determine the concatenation costs. The image database is prepared in a hierarchical fashion, including high-level features (such as a full 3D modeling of the head, geometric size and position of elements) and pixel-based, low-level features (such as a PCA-based metric for labeling the various feature bitmaps).

摘要翻译： 用于从文本输入生成照片真实的讲话头动画的系统和方法利用视听单元选择过程。通过最佳地选择并连接口区的可变长度视频单元来获得唇同步。单元选择过程利用声学数据来确定候选图像的目标成本，并利用视觉数据来确定连接成本。以分层方式准备图像数据库，包括高级特征（例如头部的完整3D建模，元件的几何尺寸和位置）和基于像素的低级特征（例如基于PCA的度量用于标记各种功能位图）。

2.

发明授权
Robust multi-modal method for recognizing objects 失效
标题翻译：用于识别对象的鲁棒多模态方法

公开(公告)号：US6118887A

公开(公告)日：2000-09-12

申请号：US948750

申请日：1997-10-10

申请人： Eric Cosatto , Hans Peter Graf , Gerasimos Potamianos

发明人： Eric Cosatto , Hans Peter Graf , Gerasimos Potamianos

IPC分类号： G06K9/00 , G06T7/20

CPC分类号： G06K9/00228 , G06T7/2033

摘要： A method for tracking heads and faces is disclosed wherein a variety of different representation models can be used to define individual heads and facial features in a multi-channel capable tracking algorithm. The representation models generated by the channels during a sequence of frames are ultimately combined into a representation comprising a highly robust and accurate tracked output. In a preferred embodiment, the method conducts an initial overview procedure to establish the optimal tracking strategy to be used in light of the particular characteristics of the tracking application.

摘要翻译： 公开了用于跟踪头部和面部的方法，其中可以使用各种不同的表示模型来在多通道能力跟踪算法中定义单独的头部和面部特征。在帧序列期间由信道产生的表示模型最终被组合成包括高度鲁棒且准确的跟踪输出的表示。在优选实施例中，该方法进行初始概览过程以根据跟踪应用的特定特征建立要使用的最佳跟踪策略。

3.

发明授权
Audio-visual selection process for the synthesis of photo-realistic talking-head animations 有权
标题翻译：视听选择过程，用于合成照片真实的讲话头动画

公开(公告)号：US07990384B2

公开(公告)日：2011-08-02

申请号：US10662550

申请日：2003-09-15

申请人： Eric Cosatto , Hans Peter Graf , Gerasimos Potamianos , Juergen Schroeter

发明人： Eric Cosatto , Hans Peter Graf , Gerasimos Potamianos , Juergen Schroeter

IPC分类号： G06T13/00

CPC分类号： G06T13/40 , G10L2015/025 , G10L2021/105 , Y10S345/956 , Y10S345/957

摘要： A system and method for generating photo-realistic talking-head animation from a text input utilizes an audio-visual unit selection process. The lip-synchronization is obtained by optimally selecting and concatenating variable-length video units of the mouth area. The unit selection process utilizes the acoustic data to determine the target costs for the candidate images and utilizes the visual data to determine the concatenation costs. The image database is prepared in a hierarchical fashion, including high-level features (such as a full 3D modeling of the head, geometric size and position of elements) and pixel-based, low-level features (such as a PCA-based metric for labeling the various feature bitmaps).

摘要翻译： 用于从文本输入生成照片真实的讲话头动画的系统和方法利用视听单元选择过程。通过最佳地选择并连接口区的可变长度视频单元来获得唇同步。单元选择过程利用声学数据来确定候选图像的目标成本，并利用视觉数据来确定连接成本。以分层方式准备图像数据库，包括高级特征（例如头部的完整3D建模，元件的几何尺寸和位置）和基于像素的低级特征（例如基于PCA的度量用于标记各种功能位图）。

4.

发明申请
Audio-visual selection process for the synthesis of photo-realistic talking-head animations 有权
标题翻译：视听选择过程，用于合成照片真实的讲话头动画

公开(公告)号：US20050057570A1

公开(公告)日：2005-03-17

申请号：US10662550

申请日：2003-09-15

申请人： Eric Cosatto , Hans Graf , Gerasimos Potamianos , Juergen Schroeter

发明人： Eric Cosatto , Hans Graf , Gerasimos Potamianos , Juergen Schroeter

IPC分类号： G06T15/70 , G10L15/02 , G10L21/06

CPC分类号： G06T13/40 , G10L2015/025 , G10L2021/105 , Y10S345/956 , Y10S345/957

摘要： A system and method for generating photo-realistic talking-head animation from a text input utilizes an audio-visual unit selection process. The lip-synchronization is obtained by optimally selecting and concatenating variable-length video units of the mouth area. The unit selection process utilizes the acoustic data to determine the target costs for the candidate images and utilizes the visual data to determine the concatenation costs. The image database is prepared in a hierarchical fashion, including high-level features (such as a full 3D modeling of the head, geometric size and position of elements) and pixel-based, low-level features (such as a PCA-based metric for labeling the various feature bitmaps).

摘要翻译： 用于从文本输入生成照片真实的讲话头动画的系统和方法利用视听单元选择过程。通过最佳地选择并连接口区的可变长度视频单元来获得唇同步。单元选择过程利用声学数据来确定候选图像的目标成本，并利用视觉数据来确定连接成本。以分层方式准备图像数据库，包括高级特征（例如头部的完整3D建模，元件的几何尺寸和位置）和基于像素的低级特征（例如基于PCA的度量用于标记各种功能位图）。

5.

发明授权
Method for likelihood computation in multi-stream HMM based speech recognition 有权
标题翻译：基于多流HMM语音识别的似然计算方法

公开(公告)号：US07480617B2

公开(公告)日：2009-01-20

申请号：US10946381

申请日：2004-09-21

申请人： Stephen Mingyu Chu , Vaibhava Goel , Etienne Marcheret , Gerasimos Potamianos

发明人： Stephen Mingyu Chu , Vaibhava Goel , Etienne Marcheret , Gerasimos Potamianos

IPC分类号： G10L15/14

CPC分类号： G10L15/144

摘要： A method for speech recognition includes determining active Gaussians related to a first feature stream and a second feature stream by labeling at least one of the first and second streams, and determining active Gaussians co-occurring in the first stream and the second stream based upon joint probability. A number of Gaussians computed is reduced based upon Gaussians already computed for the first stream and a number of Gaussians co-occurring in the second stream. Speech is decoded based on the Gaussians computed for the first and second streams.

摘要翻译： 一种用于语音识别的方法包括：通过标记第一和第二流中的至少一个来确定与第一特征流和第二特征流相关的有效高斯，以及基于联合来确定在第一流和第二流中共同存在的主动高斯可能性。基于已经为第一个流计算的高斯和在第二个流中共同出现的高斯数，减少了计算出的高斯数。基于为第一和第二流计算的高斯解码语音。

6.

发明授权
Audio-visual codebook dependent cepstral normalization 有权
标题翻译：视听码本依赖倒谱归一化

公开(公告)号：US07664637B2

公开(公告)日：2010-02-16

申请号：US11932996

申请日：2007-10-31

申请人： Sabine Deligne , Chalapathy V. Neti , Gerasimos Potamianos

发明人： Sabine Deligne , Chalapathy V. Neti , Gerasimos Potamianos

IPC分类号： G10L15/00

CPC分类号： G10L15/20 , G10L15/24

摘要： An arrangement for yielding enhanced audio features towards the provision of enhanced audio-visual features for speech recognition. Input is provided in the form of noisy audio-visual features and noisy audio features related to the noisy audio-visual features.

摘要翻译： 用于产生增强的音频特征以提供用于语音识别的增强的视听特征的装置。输入以嘈杂的视听功能和与嘈杂的视听功能相关的嘈杂音频功能的形式提供。

7.

发明授权
Audio-only backoff in audio-visual speech recognition system 有权
标题翻译：音视频语音识别系统中的音频回退

公开(公告)号：US07251603B2

公开(公告)日：2007-07-31

申请号：US10601350

申请日：2003-06-23

申请人： Jonathan H. Connell , Norman Haas , Etienne Marcheret , Chalapathy Venkata Neti , Gerasimos Potamianos

发明人： Jonathan H. Connell , Norman Haas , Etienne Marcheret , Chalapathy Venkata Neti , Gerasimos Potamianos

IPC分类号： G10L21/00

CPC分类号： G10L15/25

摘要： Techniques for performing audio-visual speech recognition, with improved recognition performance, in a degraded visual environment. For example, in one aspect of the invention, a technique for use in accordance with an audio-visual speech recognition system for improving a recognition performance thereof includes the steps/operations of: (i) selecting between an acoustic-only data model and an acoustic-visual data model based on a condition associated with a visual environment; and (ii) decoding at least a portion of an input spoken utterance using the selected data model. Advantageously, during periods of degraded visual conditions, the audio-visual speech recognition system is able to decode (recognize) input speech data using audio-only data, thus avoiding recognition inaccuracies that may result from performing speech recognition based on acoustic-visual data models and degraded visual data.

摘要翻译： 在劣化的视觉环境中执行视听语音识别技术，具有改进的识别性能。例如，在本发明的一个方面，根据用于改善其识别性能的视听语音识别系统使用的技术包括以下步骤/操作：（i）在仅声学数据模型和基于与视觉环境相关的条件的声学可视数据模型; 以及（ii）使用所选择的数据模型解码输入口头发音的至少一部分。有利的是，在恶化的视觉条件期间，视听语音识别系统能够使用仅音频数据解码（识别）输入语音数据，从而避免了基于声学可视数据模型执行语音识别可能导致的识别不准确并降低视觉数据。

8.

发明授权
System and method for likelihood computation in multi-stream HMM based speech recognition 有权
标题翻译：用于基于多流HMM的语音识别中的似然计算的系统和方法

公开(公告)号：US08121840B2

公开(公告)日：2012-02-21

申请号：US12131190

申请日：2008-06-02

申请人： Stephen Mingyu Chu , Vaibhava Goel , Etienne Marcheret , Gerasimos Potamianos

发明人： Stephen Mingyu Chu , Vaibhava Goel , Etienne Marcheret , Gerasimos Potamianos

IPC分类号： G10L15/14

CPC分类号： G10L15/144

摘要： A system and method for speech recognition includes determining active Gaussians related to a first feature stream and a second feature stream by labeling at least one of the first and second streams, and determining active Gaussians co-occurring in the first stream and the second stream based upon joint probability. A number of Gaussians computed is reduced based upon Gaussians already computed for the first stream and a number of Gaussians co-occurring in the second stream. Speech is decoded based on the Gaussians computed for the first and second streams.

摘要翻译： 用于语音识别的系统和方法包括：通过标记第一和第二流中的至少一个来确定与第一特征流和第二特征流相关的活动高斯，以及确定在第一流和第二流中共存的活动高斯联合概率。基于已经为第一个流计算的高斯和在第二个流中共同出现的高斯数，减少了计算出的高斯数。基于为第一和第二流计算的高斯解码语音。

9.

发明申请
METHOD AND APPARATUS FOR PERVASIVE AUTHENTICATION DOMAINS 有权

公开(公告)号：US20080141357A1

公开(公告)日：2008-06-12

申请号：US11932918

申请日：2007-10-31

申请人： Sabine Deligne , Chalapathy V. Neti , Gerasimos Potamianos

发明人： Sabine Deligne , Chalapathy V. Neti , Gerasimos Potamianos

IPC分类号： H04L9/32

CPC分类号： H04L63/08 , H04L63/0428 , H04L63/126

摘要： Methods and apparatus for enabling a Pervasive Authentication Domain. A Pervasive Authentication Domain allows many registered Pervasive Devices to obtain authentication credentials from a single Personal Authentication Gateway and to use these credentials on behalf of users to enable additional capabilities for the devices. It provides an arrangement for a user to store credentials in one device (the Personal Authentication Gateway), and then make use of those credentials from many authorized Pervasive Devices without re-entering the credentials. It provides a convenient way for a user to share credentials among many devices, particularly when it is not convenient to enter credentials as in a smart wristwatch environment. It further provides an arrangement for disabling access to credentials to devices that appear to be far from the Personal Authentication Gateway as measured by metrics such as communications signal strengths.

10.

发明授权
System and method for microphone activation using visual speech cues 失效
标题翻译：使用视觉语音提示的麦克风激活的系统和方法

公开(公告)号：US06754373B1

公开(公告)日：2004-06-22

申请号：US09616229

申请日：2000-07-14

申请人： Philippe de Cuetos , Giridharan R. Iyengar , Chalapathy V. Neti , Gerasimos Potamianos

发明人： Philippe de Cuetos , Giridharan R. Iyengar , Chalapathy V. Neti , Gerasimos Potamianos

IPC分类号： G06K900

CPC分类号： G10L25/78 , G06K9/00335 , G10L15/24

摘要： A system for activating a microphone based on visual speech cues, in accordance with the invention, includes a feature tracker coupled to an image acquisition device. The feature tracker tracks features in an image of a user. A region of interest extractor is coupled to the feature tracker. The region of interest extractor extracts a region of interest from the image of the user. A visual speech activity detector is coupled to the region of interest extractor and measures changes in the region of interest to determine if a visual speech cue has been generated by the user. A microphone is turned on by the visual speech activity detector when a visual speech cue has been determined by the visual speech activity detector. Methods for activating a microphone based on visual speech cues are also included.

摘要翻译： 根据本发明的用于基于视觉语音提示来激活麦克风的系统包括耦合到图像采集装置的特征跟踪器。功能跟踪器跟踪用户图像中的功能。感兴趣区域提取器耦合到特征跟踪器。感兴趣区域提取器从用户的图像中提取感兴趣的区域。视觉语音活动检测器耦合到感兴趣区域提取器，并测量感兴趣区域中的变化，以确定用户是否已经产生视觉语音提示。当视觉语音活动检测器确定了视觉语音提示时，麦克风由视觉语音活动检测器接通。还包括基于视觉语音提示激活麦克风的方法。

搜索结果

国家/区域

专利有效性

申请日

公布(公告)日

申请人

申请人所在国/区域

发明人

IPC

IPC部

IPC大类

IPC小类

IPC大组

IPC小组

外观分类