Patent search cpc:"G10L2021/105" Page 3

21.

发明申请
SYSTEM AND METHOD FOR SYNCHRONIZING SOUND AND MANUALLY TRANSCRIBED TEXT 有权
Title translation: 用于同步声音和手动传输文本的系统和方法

公开(公告)号：US20140095165A1

公开(公告)日：2014-04-03

申请号：US14038912

申请日：2013-09-27

Applicant: Nuance Communications, Inc.

Inventor： Andreas Neubacher , Miklos Papai

IPC: G10L13/00

CPC classification number: G10L13/00 , G06F17/241 , G10L15/26 , G10L2021/105

Abstract: A method for synchronizing sound data and text data, said text data being obtained by manual transcription of said sound data during playback of the latter. The proposed method comprises the steps of repeatedly querying said sound data and said text data to obtain a current time position corresponding to a currently played sound datum and a currently transcribed text datum, respectively, correcting said current time position by applying a time correction value in accordance with a transcription delay, and generating at least one association datum indicative of a synchronization association between said corrected time position and said currently transcribed text datum. Thus, the proposed method achieves cost-effective synchronization of sound and text in connection with the manual transcription of sound data.

Abstract translation: 一种用于同步声音数据和文本数据的方法，所述文本数据是通过在后者的播放期间手动转录所述声音数据而获得的。所提出的方法包括以下步骤：重复地查询所述声音数据和所述文本数据，以分别获得与当前播放的声音数据和当前转录的文本数据相对应的当前时间位置，通过将时间校正值应用于时间校正值来校正所述当前时间位置根据转录延迟，并且生成指示所述校正的时间位置和所述当前转录的文本数据之间的同步关联的至少一个关联数据。因此，所提出的方法与声音数据的手动转录相结合，实现声音和文本的成本有效的同步。

22.

发明申请
METHOD OF FACIAL IMAGE REPRODUCTION AND RELATED DEVICE 有权

公开(公告)号：US20130236102A1

公开(公告)日：2013-09-12

申请号：US13860539

申请日：2013-04-11

Applicant: CYBERLINK CORP.

Inventor： Hao-Ping Hung , Wei-Hsin Tseng

IPC: G06K9/00

CPC classification number: G06K9/00268 , G10L2021/105

Abstract: To modify a facial feature region in a video bitstream, the video bitstream is received and a feature region is extracted from the video bitstream. An audio characteristic, such as frequency, rhythm, or tempo is retrieved from an audio bitstream, and the feature region is modified according to the audio characteristic to generate a modified image. The modified image is outputted.

23.

发明申请
PHOTO-REALISTIC SYNTHESIS OF THREE DIMENSIONAL ANIMATION WITH FACIAL FEATURES SYNCHRONIZED WITH SPEECH 有权
Title translation: 具有与语音同步的特征的三维动画的照片 - 现实综合

公开(公告)号：US20120280974A1

公开(公告)日：2012-11-08

申请号：US13099387

申请日：2011-05-03

Applicant: Lijuan Wang , Frank Soong , Qiang Huo , Zhengyou Zhang

Inventor： Lijuan Wang , Frank Soong , Qiang Huo , Zhengyou Zhang

IPC: G06T13/40 , G06T15/00

CPC classification number: G06T13/40 , G10L21/10 , G10L2021/105

Abstract: Dynamic texture mapping is used to create a photorealistic three dimensional animation of an individual with facial features synchronized with desired speech. Audiovisual data of an individual reading a known script is obtained and stored in an audio library and an image library. The audiovisual data is processed to extract feature vectors used to train a statistical model. An input audio feature vector corresponding to desired speech with which the animation will be synchronized is provided. The statistical model is used to generate a trajectory of visual feature vectors that corresponds to the input audio feature vector. These visual feature vectors are used to identify a matching image sequence from the image library. The resulting sequence of images, concatenated from the image library, provides a photorealistic image sequence with facial features, such as lip movements, synchronized with the desired speech. This image sequence is applied to the three-dimensional model.

Abstract translation: 动态纹理映射用于创建具有与期望语音同步的面部特征的个体的逼真的三维动画。读取已知脚本的个人的视听数据被获取并存储在音频库和图像库中。处理视听数据以提取用于训练统计模型的特征向量。提供对应于动画将被同步的期望语音的输入音频特征向量。统计模型用于生成对应于输入音频特征向量的视觉特征向量的轨迹。这些视觉特征向量用于识别来自图像库的匹配图像序列。从图像库连接的所得到的图像序列提供具有与所需语音同步的面部特征（例如唇部移动）的照片写实图像序列。该图像序列应用于三维模型。

24.

发明授权
System and method of providing conversational visual prosody for talking heads 有权
Title translation: 提供谈话头脑的会话视觉韵律的系统和方法

公开(公告)号：US08131551B1

公开(公告)日：2012-03-06

申请号：US11458282

申请日：2006-07-18

Applicant: Eric Cosatto , Hans Peter Graf , Volker Franz Strom

Inventor： Eric Cosatto , Hans Peter Graf , Volker Franz Strom

IPC: G10L11/00

CPC classification number: G10L15/1807 , G10L2021/105

Abstract: A system and method of controlling the movement of a virtual agent while the agent is speaking to a human user during a conversation is disclosed. The method comprises receiving speech data to be spoken by the virtual agent, performing a prosodic analysis of the speech data, selecting matching prosody patterns from a speaking database and controlling the virtual agent movement according to the selected prosody patterns.

Abstract translation: 公开了一种系统和方法，用于在对话期间代理人与人类用户对话时控制虚拟代理的移动。该方法包括：接收由虚拟代理人发言的语音数据，执行语音数据的韵律分析，从说话数据库中选择匹配韵律模式，并根据所选择的韵律模式控制虚拟代理的移动。

25.

发明授权
Coarticulation method for audio-visual text-to-speech synthesis 有权
Title translation: 音视频文本到语音合成的协方法

公开(公告)号：US08078466B2

公开(公告)日：2011-12-13

申请号：US12627373

申请日：2009-11-30

Applicant: Eric Cosatto , Hans Peter Graf , Juergen Schroeter

Inventor： Eric Cosatto , Hans Peter Graf , Juergen Schroeter

IPC: G10L13/00

CPC classification number: G10L13/00 , G10L2021/105

Abstract: A method for generating animated sequences of talking heads in text-to-speech applications wherein a processor samples a plurality of frames comprising image samples. The processor reads first data comprising one or more parameters associated with noise-producing orifice images of sequences of at least three concatenated phonemes which correspond to an input stimulus. The processor reads, based on the first data, second data comprising images of a noise-producing entity. The processor generates an animated sequence of the noise-producing entity.

Abstract translation: 一种在文本到语音应用中产生通话头的动画序列的方法，其中处理器对包括图像样本的多个帧进行采样。处理器读取包括与对应于输入刺激的至少三个级联音素的序列的产生噪声的孔图像相关联的一个或多个参数的第一数据。处理器基于第一数据读取包括噪声产生实体的图像的第二数据。处理器产生产生噪声的实体的动画序列。

26.

发明授权
Synchronizing visual and speech events in a multimodal application 有权

公开(公告)号：US08055504B2

公开(公告)日：2011-11-08

申请号：US12061750

申请日：2008-04-03

Applicant: Charles W. Cross , Michael C. Hollinger , Igor R. Jablokov , David B. Lewis , Hilary A. Pike , Daniel M. Smith , David W. Wintermute , Michael A. Zaitzeff

Inventor： Charles W. Cross , Michael C. Hollinger , Igor R. Jablokov , David B. Lewis , Hilary A. Pike , Daniel M. Smith , David W. Wintermute , Michael A. Zaitzeff

IPC: G10L11/00

CPC classification number: G10L15/1815 , G10L2021/105

Abstract: Exemplary methods, systems, and products are disclosed for synchronizing visual and speech events in a multimodal application, including receiving from a user speech; determining a semantic interpretation of the speech; calling a global application update handler; identifying, by the global application update handler, an additional processing function in dependence upon the semantic interpretation; and executing the additional function. Typical embodiments may include updating a visual element after executing the additional function. Typical embodiments may include updating a voice form after executing the additional function. Typical embodiments also may include updating a state table after updating the voice form. Typical embodiments also may include restarting the voice form after executing the additional function.

27.

再颁专利
Text-to speech conversion system for synchronizing between synthesized speech and a moving picture in a multimedia environment and a method of the same 有权
Title translation: 用于在多媒体环境中的合成语音与运动图像之间进行同步的文本转换语音转换系统及其方法

公开(公告)号：USRE42647E1

公开(公告)日：2011-08-23

申请号：US10193594

申请日：2002-09-30

Applicant: Jung Chul Lee , Min Soo Hahn , Hang Seop Lee , Jae Woo Yang , Youngjik Lee

Inventor： Jung Chul Lee , Min Soo Hahn , Hang Seop Lee , Jae Woo Yang , Youngjik Lee

IPC: G10L13/08

CPC classification number: G10L13/00 , G10L2021/105

Abstract: The present invention provides a text-to-speech conversion system (TTS) for interlocking synchronizing with multimedia and a method for organizing input data of the TTS which can enhance the natural naturalness of synthesized speech and accomplish the synchronization of multimedia with TTS by defining additional prosody information, the information required to interlock synchronize TTS with multimedia, and interface between these this information and TTS for use in the production of the synthesized speech.

Abstract translation: 本发明提供了一种用于与多媒体进行互锁的文本转语音转换系统（TTS），以及用于组织TTS的输入数据的方法，该方法可以增强合成语音的自然度，并通过定义附加功能来实现多媒体与TTS的同步韵律信息，将TTS与多媒体互锁同步所需的信息，以及这些信息与TTS之间的接口，用于生成合成语音。

28.

发明授权
Synchronizing method and system 失效
Title translation: 同步方法和系统

公开(公告)号：US07913155B2

公开(公告)日：2011-03-22

申请号：US11355124

申请日：2006-02-15

Applicant: Sara H. Basson , Sarah Louise Conrod , Alexander Faisman , Wilfredo Ferre , Julien Ghez , Dimitri Kanevsky , Ronald Francis MacNeil

Inventor： Sara H. Basson , Sarah Louise Conrod , Alexander Faisman , Wilfredo Ferre , Julien Ghez , Dimitri Kanevsky , Ronald Francis MacNeil

IPC: G06F17/00

CPC classification number: H04N21/242 , G10L2021/105 , H04N21/233 , H04N21/23614 , H04N21/4307 , H04N21/4348 , H04N21/4884 , H04N21/8133

Abstract: A synchronization system and method. Text data is received by a computing device. The text data is associated with audio/video data. The audio/video data is generated during a related performance. The audio/video data and the text data are discrete data. The text data is synchronized to correspond with the audio/video data during the performance. The synchronized text data is displayed by the computing device during the performance.

Abstract translation: 同步系统和方法。文本数据由计算设备接收。文本数据与音频/视频数据相关联。在相关演出期间产生音频/视频数据。音频/视频数据和文本数据是离散数据。在演出期间，文本数据被同步以与音频/视频数据相对应。在执行期间，计算设备显示同步的文本数据。

29.

发明申请
INFORMATION PROCESSING APPARATUS, INFORMATION PROCESSING METHOD, AND PROGRAM 有权
Title translation: 信息处理设备，信息处理方法和程序

公开(公告)号：US20100211200A1

公开(公告)日：2010-08-19

申请号：US12631681

申请日：2009-12-04

Applicant: Yoshiyuki KOBAYASHI

Inventor： Yoshiyuki KOBAYASHI

IPC: G06F17/00

CPC classification number: G06F3/165 , A63F13/00 , A63F2300/8047 , G06T13/205 , G06T13/40 , G10H1/0008 , G10H1/368 , G10H2210/051 , G10H2210/066 , G10H2210/076 , G10H2220/005 , G10H2220/135 , G10H2220/141 , G10L2021/105

Abstract: An information processing apparatus is provided which includes a metadata extraction unit for analyzing an audio signal in which a plurality of instrument sounds are present in a mixed manner and for extracting, as a feature quantity of the audio signal, metadata changing along with passing of a playing time, and a player parameter determination unit for determining, based on the metadata extracted by the metadata extraction unit, a player parameter for controlling a movement of a player object corresponding to each instrument sound.

Abstract translation: 提供了一种信息处理装置，其包括元数据提取单元，用于分析其中以混合方式存在多个乐器声音的音频信号，并且用于提取作为音频信号的特征量的元数据随着通过而变化播放时间和播放器参数确定单元，用于基于由元数据提取单元提取的元数据确定用于控制对应于每个乐器声音的选手对象的移动的玩家参数。

30.

发明申请
Coarticulation Method for Audio-Visual Text-to-Speech Synthesis 有权
Title translation: 视听文本到语音合成的协方法

公开(公告)号：US20100076762A1

公开(公告)日：2010-03-25

申请号：US12627373

申请日：2009-11-30

Applicant: Eric Cosatto , Hans Peter Graf , Juergen Schroeter

Inventor： Eric Cosatto , Hans Peter Graf , Juergen Schroeter

IPC: G10L15/26 , G10L13/00 , G10L13/08

CPC classification number: G10L13/00 , G10L2021/105

Abstract: A method for generating animated sequences of talking heads in text-to-speech applications wherein a processor samples a plurality of frames comprising image samples. The processor reads first data comprising one or more parameters associated with noise-producing orifice images of sequences of at least three concatenated phonemes which correspond to an input stimulus. The processor reads, based on the first data. second data comprising images of a noise-producing entity. The processor generates an animated sequence of the noise-producing entity.

Abstract translation: 一种在文本到语音应用中产生通话头的动画序列的方法，其中处理器对包括图像样本的多个帧进行采样。处理器读取包括与对应于输入刺激的至少三个级联音素的序列的产生噪声的孔图像相关联的一个或多个参数的第一数据。处理器基于第一个数据读取。第二数据包括产生噪声的实体的图像。处理器产生产生噪声的实体的动画序列。

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification