System for low-latency animation of talking heads
    61.
    发明授权
    System for low-latency animation of talking heads 有权
    讲话头低延迟动画系统

    公开(公告)号:US07627478B2

    公开(公告)日:2009-12-01

    申请号:US11778228

    申请日:2007-07-16

    CPC classification number: G06F17/30905

    Abstract: Methods and apparatus for rendering a talking head on a client device are disclosed. The client device has a client cache capable of storing audio/visual data associated with rendering the talking head. The method comprises storing sentences in a client cache of a client device that relate to bridging delays in a dialog, storing sentence templates to be used in dialogs, generating a talking head response to a user inquiry from the client device, and determining whether sentences or stored templates stored in the client cache relate to the talking head response. If the stored sentences or stored templates relate to the talking head response, the method comprises instructing the client device to use the appropriate stored sentence or template from the client cache to render at least a part of the talking head response and transmitting a portion of the talking head response not stored in the client cache, if any, to the client device to render a complete talking head response. If the client cache has no stored data associated with the talking head response, the method comprises transmitting the talking head response to be rendered on the client device.

    Abstract translation: 公开了一种用于在客户端设备上呈现通话头的方法和设备。 客户端设备具有能够存储与呈现话音头相关联的音频/视频数据的客户端高速缓存。 该方法包括将客户端设备的客户端缓存中的句子存储在与对话中的桥接延迟相关联,存储要在对话中使用的语句模板,从客户端设备生成对用户的询问头响应,以及确定句子或 存储在客户端缓存中的存储模板涉及到通话头响应。 如果存储的句子或存储的模板与谈话头响应相关,则该方法包括指示客户端设备使用来自客户端高速缓存的适当存储的句子或模板来呈现至少一部分通话头响应并且传送一部分 说话头响应没有存储在客户端缓存中(如果有的话)给客户端设备呈现完整的通话头响应。 如果客户端缓存没有与通话头响应相关联的存储数据,则该方法包括传送要在客户端设备上呈现的通话头响应。

    COARTICULATION METHOD FOR AUDIO-VISUAL TEXT-TO-SPEECH SYNTHESIS
    62.
    发明申请
    COARTICULATION METHOD FOR AUDIO-VISUAL TEXT-TO-SPEECH SYNTHESIS 有权
    用于音视频文本到语音合成的融合方法

    公开(公告)号:US20080221904A1

    公开(公告)日:2008-09-11

    申请号:US12123154

    申请日:2008-05-19

    CPC classification number: G10L13/00 G10L2021/105

    Abstract: A method for generating animated sequences of talking heads in text-to-speech applications wherein a processor samples a plurality of frames comprising image samples. The processor reads first data comprising one or more parameters associated with noise-producing orifice images of sequences of at least three concatenated phonemes which correspond to an input stimulus. The processor reads, based on the first data. second data comprising images of a noise-producing entity. The processor generates an animated sequence of the noise-producing entity.

    Abstract translation: 一种在文本到语音应用中产生通话头的动画序列的方法,其中处理器对包括图像样本的多个帧进行采样。 处理器读取包括与对应于输入刺激的至少三个级联音素的序列的产生噪声的孔图像相关联的一个或多个参数的第一数据。 处理器基于第一个数据读取。 第二数据包括产生噪声的实体的图像。 处理器产生产生噪声的实体的动画序列。

    Coarticulation method for audio-visual text-to-speech synthesis
    63.
    发明授权
    Coarticulation method for audio-visual text-to-speech synthesis 有权
    音视频文本到语音合成的协方法

    公开(公告)号:US07392190B1

    公开(公告)日:2008-06-24

    申请号:US11466806

    申请日:2006-08-24

    CPC classification number: G10L2021/105

    Abstract: A method for generating animated sequences of talking heads in text-to-speech applications wherein a processor samples a plurality of frames comprising image samples. The processor reads first data comprising one or more parameters associated with noise-producing orifice images of sequences of at least three concatenated phonemes which correspond to an input stimulus. The processor reads, based on the first data, second data comprising images of a noise-producing entity. The processor generates an animated sequence of the noise-producing entity.

    Abstract translation: 一种在文本到语音应用中产生通话头的动画序列的方法,其中处理器对包括图像样本的多个帧进行采样。 处理器读取包括与对应于输入刺激的至少三个级联音素的序列的产生噪声的孔图像相关联的一个或多个参数的第一数据。 处理器基于第一数据读取包括噪声产生实体的图像的第二数据。 处理器产生产生噪声的实体的动画序列。

    System and method for triphone-based unit selection for visual speech synthesis
    64.
    发明授权
    System and method for triphone-based unit selection for visual speech synthesis 失效
    用于视觉语音合成的基于耳机的单元选择的系统和方法

    公开(公告)号:US07369992B1

    公开(公告)日:2008-05-06

    申请号:US11675813

    申请日:2007-02-16

    CPC classification number: G10L15/08 G10L13/07 G10L15/26 G10L2021/105 H04N19/00

    Abstract: A system and method for generating a video sequence having mouth movements synchronized with speech sounds are disclosed. The system utilizes a database of n-phones as the smallest selectable unit, wherein n is larger than 1 and preferably 3. The system calculates a target cost for each candidate n-phone for a target frame using a phonetic distance, coarticulation parameter, and speech rate. For each n-phone in a target sequence, the system searches for candidate n-phones that are visually similar according to the target cost. The system samples each candidate n-phone to get a same number of frames as in the target sequence and builds a video frame lattice of candidate video frames. The system assigns a joint cost to each pair of adjacent frames and searches the video frame lattice to construct the video sequence by finding the optimal path through the lattice according to the minimum of the sum of the target cost and the joint cost over the sequence.

    Abstract translation: 公开了一种用于产生具有与语音的同步的口部动作的视频序列的系统和方法。 该系统利用n电话的数据库作为最小的可选单元,其中n大于1,并且优选地为3.系统使用语音距离,协调参数和目标帧来计算目标帧的每个候选n电话的目标成本 言语速度 对于目标序列中的每个n电话,系统根据目标成本搜索视觉上类似的候选n电话。 系统对每个候选n电话进行采样,以获得与目标序列相同数量的帧,并建立候选视频帧的视频帧格点。 系统为每对相邻帧分配联合成本,并通过根据目标成本和序列中的联合成本的总和的最小值找到通过网格的最优路径来搜索视频帧格以构建视频序列。

    SYSTEM FOR LOW-LATENCY ANIMATION OF TALKING HEADS
    65.
    发明申请
    SYSTEM FOR LOW-LATENCY ANIMATION OF TALKING HEADS 有权
    电梯头低位动画系统

    公开(公告)号:US20080015861A1

    公开(公告)日:2008-01-17

    申请号:US11778228

    申请日:2007-07-16

    CPC classification number: G06F17/30905

    Abstract: Methods and apparatus for rendering a talking head on a client device are disclosed. The client device has a client cache capable of storing audio/visual data associated with rendering the talking head. The method comprises storing sentences in a client cache of a client device that relate to bridging delays in a dialog, storing sentence templates to be used in dialogs, generating a talking head response to a user inquiry from the client device, and determining whether sentences or stored templates stored in the client cache relate to the talking head response. If the stored sentences or stored templates relate to the talking head response, the method comprises instructing the client device to use the appropriate stored sentence or template from the client cache to render at least a part of the talking head response and transmitting a portion of the talking head response not stored in the client cache, if any, to the client device to render a complete talking head response. If the client cache has no stored data associated with the talking head response, the method comprises transmitting the talking head response to be rendered on the client device.

    Abstract translation: 公开了一种用于在客户端设备上呈现通话头的方法和设备。 客户端设备具有能够存储与呈现话音头相关联的音频/视频数据的客户端高速缓存。 该方法包括将客户端设备的客户端缓存中的句子存储在与对话中的桥接延迟相关联,存储要在对话中使用的语句模板,从客户端设备生成对用户的询问头响应,以及确定句子或 存储在客户端缓存中的存储模板涉及到通话头响应。 如果存储的句子或存储的模板与谈话头响应相关,则该方法包括指示客户端设备使用来自客户端高速缓存的适当存储的句子或模板来呈现至少一部分通话头响应并且传送一部分 说话头响应没有存储在客户端缓存中(如果有的话)给客户端设备呈现完整的通话头响应。 如果客户端缓存没有与通话头响应相关联的存储数据,则该方法包括传送要在客户端设备上呈现的通话头响应。

    Spread Kernel Support Vector Machine
    66.
    发明申请
    Spread Kernel Support Vector Machine 有权
    扩展内核支持向量机

    公开(公告)号:US20070094170A1

    公开(公告)日:2007-04-26

    申请号:US11276235

    申请日:2006-02-20

    CPC classification number: G06K9/6269 G06N99/005

    Abstract: Disclosed is a parallel support vector machine technique for solving problems with a large set of training data where the kernel computation, as well as the kernel cache and the training data, are spread over a number of distributed machines or processors. A plurality of processing nodes are used to train a support vector machine based on a set of training data. Each of the processing nodes selects a local working set of training data based on data local to the processing node, for example a local subset of gradients. Each node transmits selected data related to the working set (e.g., gradients having a maximum value) and receives an identification of a global working set of training data. The processing node optimizes the global working set of training data and updates a portion of the gradients of the global working set of training data. The updating of a portion of the gradients may include generating a portion of a kernel matrix. These steps are repeated until a convergence condition is met. Each of the local processing nodes may store all, or only a portion of, the training data. While the steps of optimizing the global working set of training data, and updating a portion of the gradients of the global working set, are performed in each of the local processing nodes, the function of generating a global working set of training data is performed in a centralized fashion based on the selected data (e.g., gradients of the local working set) received from the individual processing nodes.

    Abstract translation: 公开了一种用于解决大量训练数据的问题的并行支持向量机技术,其中内核计算以及内核高速缓存和训练数据分布在多个分布式机器或处理器上。 多个处理节点用于基于一组训练数据训练支持向量机。 每个处理节点基于处理节点本地的数据,例如梯度的本地子集,选择训练数据的本地工作集。 每个节点发送与工作集有关的所选数据(例如,具有最大值的梯度)并且接收训练数据的全局工作集合的标识。 处理节点优化训练数据的全局工作集,并更新全局训练数据工作集的一部分梯度。 梯度的一部分的更新可以包括生成内核矩阵的一部分。 重复这些步骤直到满足收敛条件。 每个本地处理节点可以存储训练数据的全部或仅一部分。 虽然在每个本地处理节点中执行优化训练数据的全局工作集和更新全局工作集的一部分梯度的步骤,但是在每个本地处理节点中执行生成训练数据的全局工作集的功能, 基于从各个处理节点接收的所选数据(例如,本地工作集的梯度)的集中式。

    System and method for triphone-based unit selection for visual speech synthesis
    67.
    发明授权
    System and method for triphone-based unit selection for visual speech synthesis 有权
    用于视觉语音合成的基于耳机的单元选择的系统和方法

    公开(公告)号:US07209882B1

    公开(公告)日:2007-04-24

    申请号:US10143717

    申请日:2002-05-10

    CPC classification number: G10L15/08 G10L13/07 G10L15/26 G10L2021/105 H04N19/00

    Abstract: A system and method for generating a video sequence having mouth movements synchronized with speech sounds are disclosed. The system utilizes a database of n-phones as the smallest selectable unit, wherein n is larger than 1 and preferably 3. The system calculates a target cost for each candidate n-phone for a target frame using a phonetic distance, coarticulation parameter, and speech rate. For each n-phone in a target sequence, the system searches for candidate n-phones that are visually similar according to the target cost. The system samples each candidate n-phone to get a same number of frames as in the target sequence and builds a video frame lattice of candidate video frames. The system assigns a joint cost to each pair of adjacent frames and searches the video frame lattice to construct the video sequence by finding the optimal path through the lattice according to the minimum of the sum of the target cost and the joint cost over the sequence.

    Abstract translation: 公开了一种用于产生具有与语音的同步的口部动作的视频序列的系统和方法。 该系统利用n电话的数据库作为最小的可选单元,其中n大于1,并且优选地为3.系统使用语音距离,协调参数和目标帧来计算目标帧的每个候选n电话的目标成本 言语速度 对于目标序列中的每个n电话,系统根据目标成本搜索视觉上类似的候选n电话。 系统对每个候选n电话进行采样,以获得与目标序列相同数量的帧,并建立候选视频帧的视频帧格点。 系统为每对相邻帧分配联合成本,并通过根据目标成本和序列中的联合成本的总和的最小值找到通过网格的最优路径来搜索视频帧格以构建视频序列。

    System and method of providing conversational visual prosody for talking heads
    68.
    发明授权
    System and method of providing conversational visual prosody for talking heads 有权
    提供谈话头脑的会话视觉韵律的系统和方法

    公开(公告)号:US07136818B1

    公开(公告)日:2006-11-14

    申请号:US10173184

    申请日:2002-06-17

    CPC classification number: G10L15/1807 G10L2021/105

    Abstract: A system and method of controlling the movement of a virtual agent while the agent is speaking to a human user during a conversation is disclosed. The method comprises receiving speech data to be spoken by the virtual agent, performing a prosodic analysis of the speech data, selecting matching prosody patterns from a speaking database and controlling the virtual agent movement according to the selected prosody patterns.

    Abstract translation: 公开了一种系统和方法,用于在对话期间代理人与人类用户对话时控制虚拟代理的移动。 该方法包括:接收由虚拟代理人发言的语音数据,执行语音数据的韵律分析,从说话数据库中选择匹配韵律模式,并根据所选择的韵律模式控制虚拟代理的移动。

    Parallel support vector method and apparatus
    69.
    发明申请
    Parallel support vector method and apparatus 审中-公开
    并行支持向量法和装置

    公开(公告)号:US20060112026A1

    公开(公告)日:2006-05-25

    申请号:US10978129

    申请日:2004-10-29

    CPC classification number: G06K9/6292 G06K9/6269 G06N20/00

    Abstract: Disclosed is an improved technique for training a support vector machine using a distributed architecture. A training data set is divided into subsets, and the subsets are optimized in a first level of optimizations, with each optimization generating a support vector set. The support vector sets output from the first level optimizations are then combined and used as input to a second level of optimizations. This hierarchical processing continues for multiple levels, with the output of each prior level being fed into the next level of optimizations. In order to guarantee a global optimal solution, a final set of support vectors from a final level of optimization processing may be fed back into the first level of the optimization cascade so that the results may be processed along with each of the training data subsets. This feedback may continue in multiple iterations until the same final support vector set is generated during two sequential iterations through the cascade, thereby guaranteeing that the solution has converged to the global optimal solution. In various embodiments, various combinations of inputs may be used by the various optimizations. The individual optimizations may be processed in parallel.

    Abstract translation: 公开了一种用于使用分布式架构来训练支持向量机的改进技术。 将训练数据集分为子集,并且子集在优化的第一级优化,每个优化生成支持向量集。 然后将来自第一级优化的支持向量集合输出组合并用作第二级优化的输入。 该分级处理持续多级,每个先前级别的输出被馈送到下一级优化。 为了保证全局最优解,可以将最终优化处理级别的最后一组支持向量反馈到优化级联的第一级,以便结果可以与每个训练数据子集一起被处理。 这种反馈可以在多次迭代中继续,直到在通过级联的两次连续迭代期间产生相同的最终支持向量集,从而保证解已经收敛到全局最优解。 在各种实施例中,各种优化可以使用输入的各种组合。 个别优化可以并行处理。

    Method for sending multi-media messages using customizable background images
    70.
    发明授权
    Method for sending multi-media messages using customizable background images 有权
    使用可定制背景图像发送多媒体消息的方法

    公开(公告)号:US07035803B1

    公开(公告)日:2006-04-25

    申请号:US10003093

    申请日:2001-11-02

    CPC classification number: G10L21/06 G10L13/00 H04L51/10

    Abstract: A system and method of providing sender customization of multi-media messages through the use of inserted images or video. The images or video may be sender-created or predefined and available to the sender via a web server. The method relates to customizing a multi-media message created by a sender for a recipient, the multi-media message having an animated entity audibly presenting speech converted from text created by the sender. The method comprises receiving at least one image from the sender, associating each at least one image with a tag, presenting the sender with options to insert the tag associated with one of the at least one image into the sender text, and after the sender inserts the tag associated with one of the at least one images into the sender text, delivering the multi-media message with the at least one image presented as background to the animated entity according to a position of the tag associated with the at least one image in the sender text. In another embodiment of the invention, a template is provided to the sender to create multi-media messages using predefined static images or video clips. The method comprises providing the sender with a group of customizable multi-media message templates, each template of the groups of templates including predefined parameters comprising a predefined text message, a predefined animated entity, a predefined background, predefined background music, and a predefined set of emoticons within the text of the message. The sender is further provided with options to accessorize the animated entity with various additional features such as glasses and the like for more creative presentation of the multi-media message.

    Abstract translation: 通过使用插入的图像或视频来提供发送者定制多媒体消息的系统和方法。 图像或视频可以是发送者创建的或预定义的,并且可通过web服务器发送给发送者。 该方法涉及定制由发送者为接收者创建的多媒体消息,多媒体消息具有可听见地呈现由发送者创建的文本转换的语音的动画实体。 所述方法包括从所述发送者接收至少一个图像,将每个至少一个图像与标签相关联,向所述发送者呈现选项以将与所述至少一个图像中的一个相关联的标签插入到所述发送者文本中,并且在所述发送者插入之后 所述标签与所述至少一个图像中的一个相关联到所述发送者文本中,根据与所述至少一个图像相关联的所述标签的位置,将所述多媒体消息与所述至少一个图像呈现为所述动画实体的背景 发件人文字。 在本发明的另一个实施例中,向发送者提供模板以使用预定义的静态图像或视频剪辑来创建多媒体消息。 该方法包括向发送者提供一组可定制的多媒体消息模板,模板组的每个模板包括预定义参数,包括预定义文本消息,预定义动画实体,预定义背景,预定义背景音乐和预定义组 的消息文本内的表情符号。 进一步向发送者提供用于利用诸如眼镜等的各种附加特征来附加动画实体的选项,以便更多地创造性地呈现多媒体消息。

Patent Agency Ranking