Hybrid approach in voice conversion
    41.
    发明授权
    Hybrid approach in voice conversion 失效
    语音转换中的混合方法

    公开(公告)号:US08224648B2

    公开(公告)日:2012-07-17

    申请号:US11966255

    申请日:2007-12-28

    IPC分类号: G01L13/06

    CPC分类号: G10L21/00 G10L2021/0135

    摘要: A hybrid approach is described for combining frequency warping and Gaussian Mixture Modeling (GMM) to achieve better speaker identity and speech quality. To train the voice conversion GMM model, line spectral frequency and other features are extracted from a set of source sounds to generate a source feature vector and from a set of target sounds to generate a target feature vector. The GMM model is estimated based on the aligned source feature vector and the target feature vector. A mixture specific warping function is generated each set of mixture mean pairs of the GMM model, and a warping function is generated based on a weighting of each of the mixture specific warping functions. The warping function can be used to convert sounds received from a source speaker to approximate speech of a target speaker.

    摘要翻译: 描述了混合方法,用于组合频率扭曲和高斯混合建模(GMM),以实现更好的扬声器身份和语音质量。 为了训练语音转换GMM模型,从一组源声音中提取线谱频率和其他特征以产生源特征向量和从一组目标声音生成目标特征向量。 基于对齐的源特征向量和目标特征向量来估计GMM模型。 每个GMM模型的混合均值对都产生混合特定的翘曲函数,并且基于每个混合特定翘曲函数的加权产生翘曲函数。 翘曲功能可用于将从源扬声器接收的声音转换为目标扬声器的近似语音。

    USER FRIENDLY SPEAKER ADAPTATION FOR SPEECH RECOGNITION
    42.
    发明申请
    USER FRIENDLY SPEAKER ADAPTATION FOR SPEECH RECOGNITION 审中-公开
    用户友好的演讲者适应语音识别

    公开(公告)号:US20100088097A1

    公开(公告)日:2010-04-08

    申请号:US12244919

    申请日:2008-10-03

    IPC分类号: G10L15/00

    CPC分类号: G10L15/07 G10L2015/0631

    摘要: Improved performance and user experience for speech recognition application and system by utilizing for example offline adaptation without tedious effort by a user. Interactions with a user may be in the form of a quiz, game, or other scenario wherein the user may implicitly provide vocal input for adaptation data. Queries with a plurality of candidate answers may be designed in an optimal and efficient way, and presented to the user, wherein detected speech from the user is then matched to one of the candidate answers, and may be used to adapt an acoustic model to the particular speaker for speech recognition.

    摘要翻译: 通过利用例如离线适配,用户对语音识别应用程序和系统的性能和用户体验得到改进,而无需繁琐的努力。 与用户的交互可以是测验,游戏或其他场景的形式,其中用户可以隐含地为适配数据提供声音输入。 可以以最佳和有效的方式设计具有多个候选答案的查询,并且呈现给用户,其中来自用户的检测到的语音然后与候选答案之一匹配,并且可以用于将声学模型适配到 用于语音识别的特定扬声器。

    METHODS, APPARATUSES, AND COMPUTER PROGRAM PRODUCTS FOR MODELING CONTACT NETWORKS
    43.
    发明申请
    METHODS, APPARATUSES, AND COMPUTER PROGRAM PRODUCTS FOR MODELING CONTACT NETWORKS 审中-公开
    用于建模联系网络的方法,设备和计算机程序产品

    公开(公告)号:US20090228513A1

    公开(公告)日:2009-09-10

    申请号:US12043614

    申请日:2008-03-06

    申请人: Jilei Tian

    发明人: Jilei Tian

    IPC分类号: G06F17/30

    CPC分类号: G06Q10/10 G06F16/24573

    摘要: An apparatus for modeling a contact network may include a processor. The processor may be configured to store a plurality of contacts lists, which collectively comprise a contact network. Each contacts list may be comprised of a plurality of contact entries and may be associated with a user of a remote device. The processor may further be configured to model the contact network using one or more modeling parameters. The processor may be configured to generate a plurality of suggested contact entries for a user based at least in part upon the one or more modeling parameters used to model the contact network. The suggested contact entries may be extracted from contact entries stored in the contacts network. Corresponding methods, systems, and computer program products are also provided.

    摘要翻译: 用于建模联系人网络的装置可以包括处理器。 处理器可以被配置为存储多个联系人列表,其共同地包括联系人网络。 每个联系人列表可以由多个联系人条目组成,并且可以与远程设备的用户相关联。 处理器还可以被配置为使用一个或多个建模参数对联系人网络建模。 该处理器可以被配置为至少部分地基于用于建模联系人网络的一个或多个建模参数来为用户生成多个建议的联系人条目。 可以从存储在联系人网络中的联系人条目中提取建议的联系人条目。 还提供了相应的方法,系统和计算机程序产品。

    Inverse Text Normalization
    44.
    发明申请
    Inverse Text Normalization 审中-公开
    反文本归一化

    公开(公告)号:US20090157385A1

    公开(公告)日:2009-06-18

    申请号:US11956910

    申请日:2007-12-14

    申请人: Jilei Tian

    发明人: Jilei Tian

    IPC分类号: G06F17/28

    CPC分类号: G06F17/28

    摘要: Embodiments are directed to efficient multilingual inverse text normalization (ITN) of text in spoken form to produce normalized text for display. Embodiments are directed to preprocessing the multilingual text into a language-independent representation, tokenizing text in spoken form, segmenting the tokenized text into ITN items by grouping consecutive words using an ITN lexicon, classifying the ITN items into ITN categories by using the ITN lexicon or tagged information from language model, applying one or more ITN rules that are selected based on the ITN categories into which ITN items have been classified to rewrite the ITN items; and post processing the ITN item and outputting inversely normalized text in written form for display. The ITN lexicon may include ITN lexicon entries that are each located within an ITN category in the ITN lexicon.

    摘要翻译: 实施例涉及语言文本的有效多语言逆文本归一化(ITN)以产生用于显示的标准化文本。 实施例涉及将多语言文本预处理为与语言无关的表示,以口头形式标记文本,通过使用ITN词典对连续词进行分组将标记化文本分割成ITN项目,通过使用ITN词典将ITN项目分类为ITN类别, 从语言模型中标记的信息,应用一个或多个ITN规则,这些ITN规则是根据ITN类别被选择的,ITN类别被分类到ITN类别中,以重写ITN项目; 并处理ITN项目,并以书面形式输出反向规范化文本进行显示。 ITN词典可能包括ITN词典条目,它们都位于ITN词典中的ITN类别中。

    Apparatus, method and computer program product providing a hierarchical approach to command-control tasks using a brain-computer interface
    45.
    发明申请
    Apparatus, method and computer program product providing a hierarchical approach to command-control tasks using a brain-computer interface 有权
    装置,方法和计算机程序产品使用脑机接口为命令控制任务提供分层方法

    公开(公告)号:US20080235164A1

    公开(公告)日:2008-09-25

    申请号:US11787906

    申请日:2007-04-17

    IPC分类号: G06F15/18 A61B5/0476

    摘要: Disclosed is a method, a computer program product, and a device that are responsive to detected mental states of a user to perform selection processes to execute a task. The method includes providing a hierarchical multi-level decision tree structure comprised of internal nodes and leaf nodes, where the decision tree structure represents a task. The method further includes navigating, using information derived from detected mental states of the user, through levels of the decision tree structure to reach a leaf node to accomplish the task. The step of navigating includes selecting, using the information derived from the detected mental states of the user, between attribute values associated with internal nodes of the decision tree structure. As non-limiting examples, the device may be a communication device, and the task may be a name dialing or a command/control task.

    摘要翻译: 公开了一种方法,计算机程序产品和装置,其响应于检测到的用户的精神状态来执行选择过程以执行任务。 该方法包括提供由内部节点和叶节点组成的分级多层决策树结构,其中决策树结构表示任务。 该方法还包括通过决定树结构的层次,使用从检测到的用户的精神状态导出的信息,导航到叶节点来完成任务。 导航步骤包括使用从检测到的用户的精神状态导出的信息,在与决策树结构的内部节点相关联的属性值之间进行选择。 作为非限制性示例,设备可以是通信设备,并且该任务可以是姓名拨号或命令/控制任务。

    System, Method, Apparatus and Computer Program Product for Providing Dynamic Vocabulary Prediction for Speech Recognition
    46.
    发明申请
    System, Method, Apparatus and Computer Program Product for Providing Dynamic Vocabulary Prediction for Speech Recognition 审中-公开
    系统,方法,设备和计算机程序产品为语音识别提供动态词汇预测

    公开(公告)号:US20080154600A1

    公开(公告)日:2008-06-26

    申请号:US11614159

    申请日:2006-12-21

    IPC分类号: G10L15/00

    CPC分类号: G10L15/083

    摘要: An apparatus for providing dynamic vocabulary prediction for setting up a speech recognition network of resource constrained portable devices may include a recognition network element. The recognition network element may be configured to determine a confidence measure for each candidate recognized word for a current word to be recognized. The recognition network element may also be configured to select a subset of candidate recognized words as selected candidate words based on the confidence measure of each one of the candidate recognized words, and determine a recognition network for a next word to be recognized, the recognition network including likely follower words for each of the selected candidate words, e.g. using language model and highly frequently used words.

    摘要翻译: 用于提供用于建立资源约束的便携式设备的语音识别网络的动态词汇预测的装置可以包括识别网络元件。 识别网络元件可以被配置为为要识别的当前字确定每个候选识别词的置信度量。 识别网元还可以被配置为基于每个候选识别字的置信度来选择候选识别字的子集作为所选择的候选字,并且确定要被识别的下一个字的识别网络,识别网络 包括每个所选候选词的可能的跟随词,例如 使用语言模型和高度常用的单词。

    MEMORY-EFFICIENT METHOD FOR HIGH-QUALITY CODEBOOK BASED VOICE CONVERSION
    47.
    发明申请
    MEMORY-EFFICIENT METHOD FOR HIGH-QUALITY CODEBOOK BASED VOICE CONVERSION 审中-公开
    用于基于高质量代码的语音转换的内存有效方法

    公开(公告)号:US20080147385A1

    公开(公告)日:2008-06-19

    申请号:US11611798

    申请日:2006-12-15

    IPC分类号: G10L19/12

    CPC分类号: G10L21/00 G10L2021/0135

    摘要: An improved system method for enabling and implementing codebook-based voice conversion that both significantly reduces the memory footprint and improves the continuity of the output. In various embodiments, the paired source-target codebook is implemented as a multi-stage vector quantizer. During the conversion, N best candidates in a tree search are taken as the output from the quantizer. The N candidates for each vector to be converted are used in a dynamic programming-based approach that finds a smooth but accurate output sequence.

    摘要翻译: 一种改进的系统方法,用于启用和实施基于代码本的语音转换,可显着减少内存占用并提高输出的连续性。 在各种实施例中,成对的源目标码本被实现为多级矢量量化器。 在转换期间,树搜索中的N个最佳候选者作为量化器的输出。 将要转换的每个向量的N个候选者用于基于动态规划的方法,其寻找平滑但准确的输出序列。

    Method, apparatus, mobile terminal and computer program product for providing efficient evaluation of feature transformation
    48.
    发明申请
    Method, apparatus, mobile terminal and computer program product for providing efficient evaluation of feature transformation 有权
    方法,装置,移动终端和计算机程序产品,用于提供特征转换的有效评估

    公开(公告)号:US20070239634A1

    公开(公告)日:2007-10-11

    申请号:US11400629

    申请日:2006-04-07

    IPC分类号: G06N3/02

    摘要: An apparatus for providing efficient evaluation of feature transformation includes a training module and a transformation module. The training module is configured to train a Gaussian mixture model (GMM) using training source data and training target data. The transformation module is in communication with the training module. The transformation module is configured to produce a conversion function in response to the training of the GMM. The training module is further configured to determine a quality of the conversion function prior to use of the conversion function by calculating a trace measurement of the GMM.

    摘要翻译: 用于提供特征变换的有效评估的装置包括训练模块和变换模块。 训练模块被配置为使用训练源数据和训练目标数据训练高斯混合模型(GMM)。 变换模块与训练模块通信。 转换模块被配置为响应于GMM的训练而产生转换功能。 训练模块还被配置为通过计算GMM的跟踪测量来确定在使用转换功能之前的转换功能的质量。

    Method for compressing dictionary data
    49.
    发明申请

    公开(公告)号:US20070073541A1

    公开(公告)日:2007-03-29

    申请号:US11605655

    申请日:2006-11-29

    申请人: Jilei Tian

    发明人: Jilei Tian

    IPC分类号: G10L15/04

    摘要: The invention relates to pre-processing of a pronunciation dictionary for compression in a data processing device, the pronunciation dictionary comprising at least one entry, the entry comprising a sequence of character units and a sequence of phoneme units. According to one aspect of the invention the sequence of character units and the sequence of phoneme units are aligned using a statistical algorithm. The aligned sequence of character units and aligned sequence of phoneme units are interleaved by inserting each phoneme unit at a predetermined location relative to the corresponding character unit.

    Correcting a pronunciation of a synthetically generated speech object
    50.
    发明申请
    Correcting a pronunciation of a synthetically generated speech object 审中-公开
    纠正合成语音对象的发音

    公开(公告)号:US20070016421A1

    公开(公告)日:2007-01-18

    申请号:US11180316

    申请日:2005-07-12

    IPC分类号: G10L13/08

    CPC分类号: G10L13/08

    摘要: This invention relates to a method, a device and a software application product for correcting a pronunciation of a speech object. The speech object is synthetically generated from a text object in dependence on a segmented representation of the text object. It is determined if an initial pronunciation of the speech object, which initial pronunciation is associated with an initial segmented representation of the text object, is incorrect. Furthermore, in case it is determined that the initial pronunciation of the speech object is incorrect, a new segmented representation of the text object is determined, which new segmented representation of the text object is associated with a new pronunciation of the speech object.

    摘要翻译: 本发明涉及一种用于校正语音对象的发音的方法,装置和软件应用产品。 语音对象根据文本对象的分段表示从文本对象合成生成。 确定初始发音是否与文本对象的初始分段表示相关联的语音对象的初始发音是不正确的。 此外,在确定语音对象的初始发音不正确的情况下,确定文本对象的新的分段表示,文本对象的哪个新的分段表示与语音对象的新发音相关联。