Method and system for aligning natural and synthetic video to speech synthesis
    1.
    发明公开
    Method and system for aligning natural and synthetic video to speech synthesis 有权
    用于与合成语音的天然及合成视频的同步的方法和装置

    公开(公告)号:EP0896322A2

    公开(公告)日:1999-02-10

    申请号:EP98306215.9

    申请日:1998-08-04

    申请人: AT&T Corp.

    IPC分类号: G10L9/20

    摘要: According to MPEG-4's TTS architecture, facial animation can be driven by two streams simultaneously - text, and Facial Animation Parameters. In this architecture, text input is sent to a Text-To-Speech converter at a decoder that drives the mouth shapes of the face. Facial Animation Parameters are sent from an encoder to the face over the communication channel. The present invention includes codes (known as bookmarks) in the text string transmitted to the Text-to-Speech converter, which bookmarks are placed between words as well as inside them. According to the present invention, the bookmarks carry an encoder time stamp. Due to the nature of text-to-speech conversion, the encoder time stamp does not relate to real-world time, and should be interpreted as a counter. In addition, the Facial Animation Parameter stream carries the same encoder time stamp found in the bookmark of the text. The system of the present invention reads the bookmark and provides the encoder time stamp as well as a real-time time stamp to the facial animation system. Finally, the facial animation system associates the correct facial animation parameter with the real-time time stamp using the encoder time stamp of the bookmark as a reference.

    摘要翻译: 文本和人脸动画参数 - 。根据MPEG-4的TTS架构,面部动画可以同时被两个流来驱动。 在这种架构中,文本输入是在解码器发送到文本到语音转换器做驱动面部的嘴部形状。 面部动画参数从编码器处发送到面部通过通信信道。 本发明包括在文本串中的反式mitted到文本到语音转换器,其书签字之间以及在其内部放置的代码(称为书签)。 。根据本发明,书签携带编码器时间戳。 由于文本到语音转换的性质,编码器时间戳不涉及真实世界的时间,并且应该被解释为一个计数器。 此外,人脸动画参数流携带的文本书签发现了同样的编码器时间戳。 本发明的系统读取书签和提供对编码器时间戳以及实时时间戳到面部动画系统。最后,面部动画系统中使用与相关联的实时时间戳的正确的面部动画参数 书签作为基准的编码器时间戳。

    Speech recognition training using bio-signals
    2.
    发明公开
    Speech recognition training using bio-signals 失效
    培训采用生物信号的语音识别系统。

    公开(公告)号:EP0660304A3

    公开(公告)日:1997-07-02

    申请号:EP94309009.2

    申请日:1994-12-05

    申请人: AT&T Corp.

    发明人: DeSimone, Joseph

    IPC分类号: G10L9/20

    CPC分类号: G10L15/24 G10L2015/0638

    摘要: A bio-signal related to the impedance between two points 34, 36 on a speaker's skin 32 is monitored while a speech recognition system is trained to recognize a word or utterance. An utterance is identified for retraining when the bio-signal is above a upper threshold or below a lower threshold while the recognition system is being trained to recognize the utterance. The recognition system is retrained to recognize the utterance when the bio-signal is between the upper and lower thresholds.

    Audio visual speech recognition
    3.
    发明公开
    Audio visual speech recognition 失效
    Akustische und optische Spracherkennung。

    公开(公告)号:EP0336032A1

    公开(公告)日:1989-10-11

    申请号:EP88303125.4

    申请日:1988-04-07

    IPC分类号: G10L5/06 G10L9/20

    CPC分类号: G10L15/24

    摘要: At least some of a sequence of spoken phonemes are indicated by analysing detected sounds to determine a group of phonemes to which a phoneme belongs, optically detecting the lipshape of the speaker and correlating the respective signals by a computer.

    摘要翻译: 通过分析检测到的声音来指示音素序列中的至少一些,以确定音素所属的一组音素,光学地检测说话者的嘴唇形状,并通过计算机对各个信号进行相关。

    Speech recognition method and apparatus
    5.
    发明公开
    Speech recognition method and apparatus 失效
    Verfahren und Vorrichtung zur Spracherkennung

    公开(公告)号:EP0702355A2

    公开(公告)日:1996-03-20

    申请号:EP95306401.1

    申请日:1995-09-13

    IPC分类号: G10L9/20

    摘要: A viewpoint of a user is detected in a viewpoint detecting process, and how long the detected viewpoint has stayed in an area is determined. The obtained viewpoint and its trace is displayed on a display unit. In a recognition information controlling process, relationship between the viewpoint (in an area) and/or its movement, and recognition information (words, sentences, grammar, etc.) is obtained as weight P(). When the user pronounces a word (or sentence), the speech is inputted and A/D converted via a speech input unit. Next, in a speech recognition process, a speech recognition probability PS() is obtained. Finally, speech recognition is performed on the basis of a product of the weight P() and the speech recognition probability PS(). Accordingly, classes of the recognition information are controlled in accordance with the movement of the user's viewpoint, thereby improving the speech recognition probability and speed of recognition.

    摘要翻译: 在视点检测处理中检测用户的视点,并且确定检测到的视点滞留在一个区域中的时间。 获得的视点及其轨迹显示在显示单元上。 在识别信息控制处理中,获得视点(区域)和/或其移动与识别信息(单词,句子,语法等)之间的关系作为权重P()。 当用户发音(或句子)时,通过语音输入单元输入语音并进行A / D转换。 接下来,在语音识别处理中,获得语音识别概率PS()。 最后,基于权重P()和语音识别概率PS()的乘积进行语音识别。 因此,根据用户视点的移动来控制识别信息的类别,从而提高语音识别概率和识别速度。

    Method and system for aligning natural and synthetic video to speech synthesis
    6.
    发明公开
    Method and system for aligning natural and synthetic video to speech synthesis 有权
    用于与合成语音的天然及合成视频的同步的方法和装置

    公开(公告)号:EP0896322A3

    公开(公告)日:1999-10-06

    申请号:EP98306215.9

    申请日:1998-08-04

    申请人: AT&T Corp.

    IPC分类号: G10L9/20

    摘要: According to MPEG-4's TTS architecture, facial animation can be driven by two streams simultaneously - text, and Facial Animation Parameters. In this architecture, text input is sent to a Text-To-Speech converter at a decoder that drives the mouth shapes of the face. Facial Animation Parameters are sent from an encoder to the face over the communication channel. The present invention includes codes (known as bookmarks) in the text string transmitted to the Text-to-Speech converter, which bookmarks are placed between words as well as inside them. According to the present invention, the bookmarks carry an encoder time stamp. Due to the nature of text-to-speech conversion, the encoder time stamp does not relate to real-world time, and should be interpreted as a counter. In addition, the Facial Animation Parameter stream carries the same encoder time stamp found in the bookmark of the text. The system of the present invention reads the bookmark and provides the encoder time stamp as well as a real-time time stamp to the facial animation system. Finally, the facial animation system associates the correct facial animation parameter with the real-time time stamp using the encoder time stamp of the bookmark as a reference.

    Automated speech alignment for image synthesis
    7.
    发明公开
    Automated speech alignment for image synthesis 失效
    Bilderzeugung的Automatische Sprachsynchronisierung

    公开(公告)号:EP0860811A2

    公开(公告)日:1998-08-26

    申请号:EP98103191.7

    申请日:1998-02-24

    IPC分类号: G10L9/20

    摘要: In a computerized method, speech signals are analyzed using statistical trajectory modeling to produce time aligned acoustic-phonetic units. There is one acoustic-phonetic unit for each portion of the speech signal determined to be phonetically distinct. The acoustic-phonetic units are translated to corresponding time aligned image units representative of the acoustic-phonetic units. An image including the time aligned image units is displayed. The display of the time aligned image units is synchronized to a replaying of the digitized natural speech signal.

    摘要翻译: 在计算机化方法中,使用统计轨迹建模来分析语音信号,以产生时间对准的声学语音单元。 语音信号的每个部分被确定为具有语音不同的一个声学语音单元。 声音单元被转换成代表声音单元的对应的时间对齐图像单元。 显示包括时间对齐图像单元的图像。 时间对准图像单元的显示与数字化自然语音信号的重放同步。

    Voice operated game apparatus
    8.
    发明公开
    Voice operated game apparatus 失效
    语音控制游戏设备。

    公开(公告)号:EP0683481A3

    公开(公告)日:1998-03-04

    申请号:EP95107008.5

    申请日:1995-05-09

    IPC分类号: G10L3/00 G10L9/20

    CPC分类号: G10L15/25 G10L25/87

    摘要: A game apparatus of the invention includes: a voice input section for inputting at least one voice set including voice uttered by an operator, for converting the voice set into a first electric signal, and for outputting the first electric signal; a voice recognition section for recognizing the voice set on the basis of the first electric signal output from the voice input means; an image input section for optically detecting a movement of the lips of the operator, for converting the detected movement of lips into a second electric signal, and for outputting the second electric signal; a speech period detection section for receiving the second electric signal, and for obtaining a period in which the voice is uttered by the operator on the basis of the received second electric signal; an overall judgment section for extracting the voice uttered by the operator from the input voice set, on the basis of the voice set recognized by the voice recognition means and the period obtained by the speech period detection means; and a control means for controlling an object on the basis of the voice extracted by the overall judgment means.

    Automated speech alignment for image synthesis
    9.
    发明公开
    Automated speech alignment for image synthesis 失效
    成像自动语言同步

    公开(公告)号:EP0860811A3

    公开(公告)日:1999-02-10

    申请号:EP98103191.7

    申请日:1998-02-24

    IPC分类号: G10L9/20

    摘要: In a computerized method, speech signals are analyzed using statistical trajectory modeling to produce time aligned acoustic-phonetic units. There is one acoustic-phonetic unit for each portion of the speech signal determined to be phonetically distinct. The acoustic-phonetic units are translated to corresponding time aligned image units representative of the acoustic-phonetic units. An image including the time aligned image units is displayed. The display of the time aligned image units is synchronized to a replaying of the digitized natural speech signal.

    Animal's intention translational method
    10.
    发明公开
    Animal's intention translational method 失效
    对于翻译动物意向方法

    公开(公告)号:EP0813186A3

    公开(公告)日:1998-10-07

    申请号:EP96305907

    申请日:1996-08-12

    申请人: YAMAMOTO MASAOMI

    发明人: YAMAMOTO MASAOMI

    摘要: This invention is an animal's intention translational method. This method is, first of all, to receive either of the two or both of informational signals about a voice of the animal such as a baby, pet, and domestic animal that they utter and animal's actions. After that, it compares with the received informational signal and the data which are analysed by the animal behavioralism beforehand, and it selects the data. In addition, the received informational signal is indicated what the animal appeals in words or letters that people are able to understand. As for the above mentioned invention, people are able to communication correctly with the animal.