SYSTEM AND METHOD FOR SYNCHRONIZING SOUND AND MANUALLY TRANSCRIBED TEXT
    21.
    发明申请
    SYSTEM AND METHOD FOR SYNCHRONIZING SOUND AND MANUALLY TRANSCRIBED TEXT 有权
    用于同步声音和手动传输文本的系统和方法

    公开(公告)号:US20140095165A1

    公开(公告)日:2014-04-03

    申请号:US14038912

    申请日:2013-09-27

    CPC classification number: G10L13/00 G06F17/241 G10L15/26 G10L2021/105

    Abstract: A method for synchronizing sound data and text data, said text data being obtained by manual transcription of said sound data during playback of the latter. The proposed method comprises the steps of repeatedly querying said sound data and said text data to obtain a current time position corresponding to a currently played sound datum and a currently transcribed text datum, respectively, correcting said current time position by applying a time correction value in accordance with a transcription delay, and generating at least one association datum indicative of a synchronization association between said corrected time position and said currently transcribed text datum. Thus, the proposed method achieves cost-effective synchronization of sound and text in connection with the manual transcription of sound data.

    Abstract translation: 一种用于同步声音数据和文本数据的方法,所述文本数据是通过在后者的播放期间手动转录所述声音数据而获得的。 所提出的方法包括以下步骤:重复地查询所述声音数据和所述文本数据,以分别获得与当前播放的声音数据和当前转录的文本数据相对应的当前时间位置,通过将时间校正值应用于时间校正值来校正所述当前时间位置 根据转录延迟,并且生成指示所述校正的时间位置和所述当前转录的文本数据之间的同步关联的至少一个关联数据。 因此,所提出的方法与声音数据的手动转录相结合,实现声音和文本的成本有效的同步。

    METHOD OF FACIAL IMAGE REPRODUCTION AND RELATED DEVICE

    公开(公告)号:US20130236102A1

    公开(公告)日:2013-09-12

    申请号:US13860539

    申请日:2013-04-11

    CPC classification number: G06K9/00268 G10L2021/105

    Abstract: To modify a facial feature region in a video bitstream, the video bitstream is received and a feature region is extracted from the video bitstream. An audio characteristic, such as frequency, rhythm, or tempo is retrieved from an audio bitstream, and the feature region is modified according to the audio characteristic to generate a modified image. The modified image is outputted.

    PHOTO-REALISTIC SYNTHESIS OF THREE DIMENSIONAL ANIMATION WITH FACIAL FEATURES SYNCHRONIZED WITH SPEECH
    23.
    发明申请
    PHOTO-REALISTIC SYNTHESIS OF THREE DIMENSIONAL ANIMATION WITH FACIAL FEATURES SYNCHRONIZED WITH SPEECH 有权
    具有与语音同步的特征的三维动画的照片 - 现实综合

    公开(公告)号:US20120280974A1

    公开(公告)日:2012-11-08

    申请号:US13099387

    申请日:2011-05-03

    CPC classification number: G06T13/40 G10L21/10 G10L2021/105

    Abstract: Dynamic texture mapping is used to create a photorealistic three dimensional animation of an individual with facial features synchronized with desired speech. Audiovisual data of an individual reading a known script is obtained and stored in an audio library and an image library. The audiovisual data is processed to extract feature vectors used to train a statistical model. An input audio feature vector corresponding to desired speech with which the animation will be synchronized is provided. The statistical model is used to generate a trajectory of visual feature vectors that corresponds to the input audio feature vector. These visual feature vectors are used to identify a matching image sequence from the image library. The resulting sequence of images, concatenated from the image library, provides a photorealistic image sequence with facial features, such as lip movements, synchronized with the desired speech. This image sequence is applied to the three-dimensional model.

    Abstract translation: 动态纹理映射用于创建具有与期望语音同步的面部特征的个体的逼真的三维动画。 读取已知脚本的个人的视听数据被获取并存储在音频库和图像库中。 处理视听数据以提取用于训练统计模型的特征向量。 提供对应于动画将被同步的期望语音的输入音频特征向量。 统计模型用于生成对应于输入音频特征向量的视觉特征向量的轨迹。 这些视觉特征向量用于识别来自图像库的匹配图像序列。 从图像库连接的所得到的图像序列提供具有与所需语音同步的面部特征(例如唇部移动)的照片写实图像序列。 该图像序列应用于三维模型。

    System and method of providing conversational visual prosody for talking heads
    24.
    发明授权
    System and method of providing conversational visual prosody for talking heads 有权
    提供谈话头脑的会话视觉韵律的系统和方法

    公开(公告)号:US08131551B1

    公开(公告)日:2012-03-06

    申请号:US11458282

    申请日:2006-07-18

    CPC classification number: G10L15/1807 G10L2021/105

    Abstract: A system and method of controlling the movement of a virtual agent while the agent is speaking to a human user during a conversation is disclosed. The method comprises receiving speech data to be spoken by the virtual agent, performing a prosodic analysis of the speech data, selecting matching prosody patterns from a speaking database and controlling the virtual agent movement according to the selected prosody patterns.

    Abstract translation: 公开了一种系统和方法,用于在对话期间代理人与人类用户对话时控制虚拟代理的移动。 该方法包括:接收由虚拟代理人发言的语音数据,执行语音数据的韵律分析,从说话数据库中选择匹配韵律模式,并根据所选择的韵律模式控制虚拟代理的移动。

    Coarticulation method for audio-visual text-to-speech synthesis
    25.
    发明授权
    Coarticulation method for audio-visual text-to-speech synthesis 有权
    音视频文本到语音合成的协方法

    公开(公告)号:US08078466B2

    公开(公告)日:2011-12-13

    申请号:US12627373

    申请日:2009-11-30

    CPC classification number: G10L13/00 G10L2021/105

    Abstract: A method for generating animated sequences of talking heads in text-to-speech applications wherein a processor samples a plurality of frames comprising image samples. The processor reads first data comprising one or more parameters associated with noise-producing orifice images of sequences of at least three concatenated phonemes which correspond to an input stimulus. The processor reads, based on the first data, second data comprising images of a noise-producing entity. The processor generates an animated sequence of the noise-producing entity.

    Abstract translation: 一种在文本到语音应用中产生通话头的动画序列的方法,其中处理器对包括图像样本的多个帧进行采样。 处理器读取包括与对应于输入刺激的至少三个级联音素的序列的产生噪声的孔图像相关联的一个或多个参数的第一数据。 处理器基于第一数据读取包括噪声产生实体的图像的第二数据。 处理器产生产生噪声的实体的动画序列。

    INFORMATION PROCESSING APPARATUS, INFORMATION PROCESSING METHOD, AND PROGRAM
    29.
    发明申请
    INFORMATION PROCESSING APPARATUS, INFORMATION PROCESSING METHOD, AND PROGRAM 有权
    信息处理设备,信息处理方法和程序

    公开(公告)号:US20100211200A1

    公开(公告)日:2010-08-19

    申请号:US12631681

    申请日:2009-12-04

    Abstract: An information processing apparatus is provided which includes a metadata extraction unit for analyzing an audio signal in which a plurality of instrument sounds are present in a mixed manner and for extracting, as a feature quantity of the audio signal, metadata changing along with passing of a playing time, and a player parameter determination unit for determining, based on the metadata extracted by the metadata extraction unit, a player parameter for controlling a movement of a player object corresponding to each instrument sound.

    Abstract translation: 提供了一种信息处理装置,其包括元数据提取单元,用于分析其中​​以混合方式存在多个乐器声音的音频信号,并且用于提取作为音频信号的特征量的元数据随着通过而变化 播放时间和播放器参数确定单元,用于基于由元数据提取单元提取的元数据确定用于控制对应于每个乐器声音的选手对象的移动的玩家参数。

    Coarticulation Method for Audio-Visual Text-to-Speech Synthesis
    30.
    发明申请
    Coarticulation Method for Audio-Visual Text-to-Speech Synthesis 有权
    视听文本到语音合成的协方法

    公开(公告)号:US20100076762A1

    公开(公告)日:2010-03-25

    申请号:US12627373

    申请日:2009-11-30

    CPC classification number: G10L13/00 G10L2021/105

    Abstract: A method for generating animated sequences of talking heads in text-to-speech applications wherein a processor samples a plurality of frames comprising image samples. The processor reads first data comprising one or more parameters associated with noise-producing orifice images of sequences of at least three concatenated phonemes which correspond to an input stimulus. The processor reads, based on the first data. second data comprising images of a noise-producing entity. The processor generates an animated sequence of the noise-producing entity.

    Abstract translation: 一种在文本到语音应用中产生通话头的动画序列的方法,其中处理器对包括图像样本的多个帧进行采样。 处理器读取包括与对应于输入刺激的至少三个级联音素的序列的产生噪声的孔图像相关联的一个或多个参数的第一数据。 处理器基于第一个数据读取。 第二数据包括产生噪声的实体的图像。 处理器产生产生噪声的实体的动画序列。

Patent Agency Ranking