PHOTO-REALISTIC SYNTHESIS OF THREE DIMENSIONAL ANIMATION WITH FACIAL FEATURES SYNCHRONIZED WITH SPEECH
    1.
    发明申请
    PHOTO-REALISTIC SYNTHESIS OF THREE DIMENSIONAL ANIMATION WITH FACIAL FEATURES SYNCHRONIZED WITH SPEECH 有权
    具有与语音同步的特征的三维动画的照片 - 现实综合

    公开(公告)号:US20120280974A1

    公开(公告)日:2012-11-08

    申请号:US13099387

    申请日:2011-05-03

    IPC分类号: G06T13/40 G06T15/00

    摘要: Dynamic texture mapping is used to create a photorealistic three dimensional animation of an individual with facial features synchronized with desired speech. Audiovisual data of an individual reading a known script is obtained and stored in an audio library and an image library. The audiovisual data is processed to extract feature vectors used to train a statistical model. An input audio feature vector corresponding to desired speech with which the animation will be synchronized is provided. The statistical model is used to generate a trajectory of visual feature vectors that corresponds to the input audio feature vector. These visual feature vectors are used to identify a matching image sequence from the image library. The resulting sequence of images, concatenated from the image library, provides a photorealistic image sequence with facial features, such as lip movements, synchronized with the desired speech. This image sequence is applied to the three-dimensional model.

    摘要翻译: 动态纹理映射用于创建具有与期望语音同步的面部特征的个体的逼真的三维动画。 读取已知脚本的个人的视听数据被获取并存储在音频库和图像库中。 处理视听数据以提取用于训练统计模型的特征向量。 提供对应于动画将被同步的期望语音的输入音频特征向量。 统计模型用于生成对应于输入音频特征向量的视觉特征向量的轨迹。 这些视觉特征向量用于识别来自图像库的匹配图像序列。 从图像库连接的所得到的图像序列提供具有与所需语音同步的面部特征(例如唇部移动)的照片写实图像序列。 该图像序列应用于三维模型。

    Photo-realistic synthesis of three dimensional animation with facial features synchronized with speech

    公开(公告)号:US09613450B2

    公开(公告)日:2017-04-04

    申请号:US13099387

    申请日:2011-05-03

    IPC分类号: G06T13/40 G10L21/10

    摘要: Dynamic texture mapping is used to create a photorealistic three dimensional animation of an individual with facial features synchronized with desired speech. Audiovisual data of an individual reading a known script is obtained and stored in an audio library and an image library. The audiovisual data is processed to extract feature vectors used to train a statistical model. An input audio feature vector corresponding to desired speech with which the animation will be synchronized is provided. The statistical model is used to generate a trajectory of visual feature vectors that corresponds to the input audio feature vector. These visual feature vectors are used to identify a matching image sequence from the image library. The resulting sequence of images, concatenated from the image library, provides a photorealistic image sequence with facial features, such as lip movements, synchronized with the desired speech. This image sequence is applied to the three-dimensional model.

    Template constrained posterior probability
    3.
    发明申请
    Template constrained posterior probability 审中-公开
    模板约束后验概率

    公开(公告)号:US20090099847A1

    公开(公告)日:2009-04-16

    申请号:US11973735

    申请日:2007-10-10

    IPC分类号: G10L11/00

    CPC分类号: G10L2015/228

    摘要: Detailed herein is a technology which, among other things, reduces errors introduced in recording and transcription data. In one approach to this technology, a method of detecting audio transcription errors is utilized. This method includes selected a focus unit, and selecting a context template corresponding to the focus unit. A hypothesis set is then determined, with reference to the context template and the focus unit. A probability is calculated corresponding to the focus unit, across the hypothesis set.

    摘要翻译: 本文详细描述了一种技术,其中除了特别地减少在记录和转录数据中引入的错误。 在该技术的一种方法中,利用检测音频转录错误的方法。 该方法包括选择的焦点单元,以及选择与焦点单元对应的上下文模板。 然后参照上下文模板和焦点单元确定假设集合。 通过假设集合对应于焦点单位计算概率。

    Photo-realistic synthesis of image sequences with lip movements synchronized with speech

    公开(公告)号:US09728203B2

    公开(公告)日:2017-08-08

    申请号:US13098488

    申请日:2011-05-02

    IPC分类号: G10L21/10

    CPC分类号: G10L21/10 G10L2021/105

    摘要: Audiovisual data of an individual reading a known script is obtained and stored in an audio library and an image library. The audiovisual data is processed to extract feature vectors used to train a statistical model. An input audio feature vector corresponding to desired speech with which a synthesized image sequence will be synchronized is provided. The statistical model is used to generate a trajectory of visual feature vectors that corresponds to the input audio feature vector. These visual feature vectors are used to identify a matching image sequence from the image library. The resulting sequence of images, concatenated from the image library, provides a photorealistic image sequence with lip movements synchronized with the desired speech.

    PHOTO-REALISTIC SYNTHESIS OF IMAGE SEQUENCES WITH LIP MOVEMENTS SYNCHRONIZED WITH SPEECH
    5.
    发明申请
    PHOTO-REALISTIC SYNTHESIS OF IMAGE SEQUENCES WITH LIP MOVEMENTS SYNCHRONIZED WITH SPEECH 有权
    具有与语音同步的LIP运动的图像序列的照片 - 现实综合

    公开(公告)号:US20120284029A1

    公开(公告)日:2012-11-08

    申请号:US13098488

    申请日:2011-05-02

    IPC分类号: G10L21/00

    CPC分类号: G10L21/10 G10L2021/105

    摘要: Audiovisual data of an individual reading a known script is obtained and stored in an audio library and an image library. The audiovisual data is processed to extract feature vectors used to train a statistical model. An input audio feature vector corresponding to desired speech with which a synthesized image sequence will be synchronized is provided. The statistical model is used to generate a trajectory of visual feature vectors that corresponds to the input audio feature vector. These visual feature vectors are used to identify a matching image sequence from the image library. The resulting sequence of images, concatenated from the image library, provides a photorealistic image sequence with lip movements synchronized with the desired speech.

    摘要翻译: 读取已知脚本的个人的视听数据被获取并存储在音频库和图像库中。 处理视听数据以提取用于训练统计模型的特征向量。 提供了与合成图像序列将被同步的期望语音相对应的输入音频特征向量。 统计模型用于生成对应于输入音频特征向量的视觉特征向量的轨迹。 这些视觉特征向量用于识别来自图像库的匹配图像序列。 从图像库连接的所得到的图像序列提供了与期望语音同步的唇部运动的照片级逼真图像序列。

    Radical-Based HMM Modeling for Handwritten East Asian Characters
    7.
    发明申请
    Radical-Based HMM Modeling for Handwritten East Asian Characters 有权
    手写东亚人物基于激进的HMM建模

    公开(公告)号:US20080219556A1

    公开(公告)日:2008-09-11

    申请号:US11682722

    申请日:2007-03-06

    IPC分类号: G06K9/18

    CPC分类号: G06K9/00879

    摘要: Exemplary methods, systems, and computer-readable media for developing, training and/or using models for online handwriting recognition of characters are described. An exemplary method for building a trainable radical-based HMM for use in character recognition includes defining radical nodes, where a radical node represents a structural element of an character, and defining connection nodes, where a connection node represents a spatial relationship between two or more radicals. Such a method may include determining a number of paths in the radical-based HMM using subsequence direction histogram vector (SDHV) clustering and determining a number of states in the radical-based HMM using curvature scale space-based (CSS) corner detection.

    摘要翻译: 描述用于开发,训练和/或使用用于字符的在线手写识别的模型的示例性方法,系统和计算机可读介质。 用于构建用于字符识别的可训练基于激进的基于HMM的示例性方法包括定义基本节点,其中基本节点表示字符的结构元素,并且定义连接节点,其中连接节点表示两个或更多个之间的空间关系 激进分子 这种方法可以包括使用子序列方向直方图向量(SDHV)聚类确定基于激进的HMM中的路径数量,并使用基于曲率空间的(CSS)角检测确定基于激进的HMM中的状态数。

    Radical Set Determination For HMM Based East Asian Character Recognition
    8.
    发明申请
    Radical Set Determination For HMM Based East Asian Character Recognition 失效
    基于HMM的东亚字符识别的激进集确定

    公开(公告)号:US20080205761A1

    公开(公告)日:2008-08-28

    申请号:US11680566

    申请日:2007-02-28

    IPC分类号: G06K9/18

    摘要: Exemplary techniques are described for selecting radical sets for use in probabilistic East Asian character recognition algorithms. An exemplary technique includes applying a decomposition rule to each East Asian character of the set to generate a progressive splitting graph where the progressive splitting graph comprises radicals as nodes, formulating an optimization problem to find an optimal set of radicals to represent the set of East Asian characters using maximum likelihood and minimum description length and solving the optimization problem for the optimal set of radicals. Another exemplary technique includes selecting an optimal set of radicals by using a general function that characterizes a radical with respect to other East Asian characters and a complex function that characterizes complexity of a radical.

    摘要翻译: 描述了用于选择在概率东亚字符识别算法中使用的激进集合的示例性技术。 一个示例性的技术包括将分解规则应用于集合的每个东亚字符以生成逐行分割图,其中渐进分割图包括基数作为节点,制定优化问题以找到最佳的一组基团以表示东亚集 字符使用最大似然和最小描述长度,并解决优化问题的最佳组的自由基。 另一个示例性技术包括通过使用表征相对于其他东亚字符的基数的一般函数和表征激进的复杂度的复杂函数来选择最佳的自由基集合。

    Common word graph based multimodal input
    9.
    发明申请
    Common word graph based multimodal input 有权
    基于常用字图的多模态输入

    公开(公告)号:US20070239432A1

    公开(公告)日:2007-10-11

    申请号:US11394809

    申请日:2006-03-30

    IPC分类号: G06F17/27

    CPC分类号: G06F17/27

    摘要: Multiple input modalities are selectively used by a user or process to prune a word graph. Pruning initiates rescoring in order to generate a new word graph with a revised best path.

    摘要翻译: 用户或进程有选择地使用多种输入模式来修剪单词图形。 修剪开始拯救,以生成一个修改最佳路径的新字图。

    SYNTHESIZED SINGING VOICE WAVEFORM GENERATOR
    10.
    发明申请
    SYNTHESIZED SINGING VOICE WAVEFORM GENERATOR 审中-公开
    合成声音波形发生器

    公开(公告)号:US20110231193A1

    公开(公告)日:2011-09-22

    申请号:US13151660

    申请日:2011-06-02

    申请人: Yao Qian Frank Soong

    发明人: Yao Qian Frank Soong

    IPC分类号: G10L13/08

    摘要: Various technologies for generating a synthesized singing voice waveform. In one implementation, the computer program may receive a request from a user to create a synthesized singing voice using the lyrics of a song and a digital file containing its melody as inputs. The computer program may then dissect the lyrics' text and its melody file into its corresponding sub-phonemic units and musical score respectively. The musical score may be further dissected into a sequence of musical notes and duration times for each musical note. The computer program may then determine a fundamental frequency (F0), or pitch, of each musical note.

    摘要翻译: 用于生成合成歌唱声音波形的各种技术。 在一个实现中,计算机程序可以使用歌曲的歌词和包含其旋律的数字文件作为输入来接收来自用户的请求以创建合成歌唱声音。 然后,计算机程序可以分别将歌词的文本及其旋律文件分解成其对应的子音素单元和乐谱。 乐谱可以进一步解剖为每个音符的一系列音符和持续时间。 然后,计算机程序可以确定每个音符的基本频率(F0)或音高。