Method and apparatus for improved duration modeling of phonemes
    1.
    发明授权
    Method and apparatus for improved duration modeling of phonemes 有权
    用于改善音素持续时间建模的方法和装置

    公开(公告)号:US06366884B1

    公开(公告)日:2002-04-02

    申请号:US09436048

    申请日:1999-11-08

    IPC分类号: G10L1300

    CPC分类号: G10L13/10 G10L13/04 G10L13/08

    摘要: A method and an apparatus for improved duration modeling of phonemes in a speech synthesis system are provided. According to one aspect, text is received into a processor of a speech synthesis system. The received text is processed using a sum-of-products phoneme duration model that is used in either the formant method or the concatenative method of speech generation. The phoneme duration model, which is used along with a phoneme pitch model, is produced by developing a non-exponential functional transformation form for use with a generalized additive model. The non-exponential functional transformation form comprises a root sinusoidal transformation that is controlled in response to a minimum phoneme duration and a maximum phoneme duration. The minimum and maximum phoneme durations are observed in training data. The received text is processed by specifying at least one of a number of contextual factors for the generalized additive model. An inverse of the non-exponential functional transformation is applied to duration observations, or training data. Coefficients are generated for use with the generalized additive model. The generalized additive model comprising the coefficients is applied to at least one phoneme of the received text resulting in the generation of at least one phoneme having a duration. An acoustic sequence is generated comprising speech signals that are representative of the received text.

    摘要翻译: 提供了一种用于在语音合成系统中改善音素的持续时间建模的方法和装置。 根据一个方面,文本被接收到语音合成系统的处理器中。 所收到的文本是使用产品总和音程持续时间模型来处理的,该模型用于共振峰方法或语音产生的并置方法。 与音素音调模型一起使用的音素持续时间模型通过开发用于广义加性模型的非指数函数变换形式来产生。 非指数函数变换形式包括根据最小音素持续时间和最大音素持续时间来控制的根正弦变换。 在训练数据中观察到最小和最大音素持续时间。 通过指定广义加法模型的多个上下文因素中的至少一个来处理接收到的文本。 非指数函数变换的逆向应用于持续时间观察或训练数据。 生成与广义加法模型一起使用的系数。 包括系数的广义加法模型被应用于接收到的文本的至少一个音素,导致产生具有持续时间的至少一个音素。 产生包括表示所接收文本的语音信号的声学序列。

    Method and apparatus for improved duration modeling of phonemes

    公开(公告)号:US6064960A

    公开(公告)日:2000-05-16

    申请号:US993940

    申请日:1997-12-18

    IPC分类号: G10L13/08

    CPC分类号: G10L13/10 G10L13/04 G10L13/08

    摘要: A method and an apparatus for improved duration modeling of phonemes in a speech synthesis system are provided. According to one aspect, text is received into a processor of a speech synthesis system. The received text is processed using a sum-of-products phoneme duration model that is used in either the formant method or the concatenative method of speech generation. The phoneme duration model, which is used along with a phoneme pitch model, is produced by developing a non-exponential functional transformation form for use with a generalized additive model. The non-exponential functional transformation form comprises a root sinusoidal transformation that is controlled in response to a minimum phoneme duration and a maximum phoneme duration. The minimum and maximum phoneme durations are observed in training data. The received text is processed by specifying at least one of a number of contextual factors for the generalized additive model. An inverse of the non-exponential functional transformation is applied to duration observations, or training data. Coefficients are generated for use with the generalized additive model. The generalized additive model comprising the coefficients is applied to at least one phoneme of the received text resulting in the generation of at least one phoneme having a duration. An acoustic sequence is generated comprising speech signals that are representative of the received text.

    Method and apparatus for improved duration modeling of phonemes
    3.
    发明授权
    Method and apparatus for improved duration modeling of phonemes 有权
    用于改善音素持续时间建模的方法和装置

    公开(公告)号:US06785652B2

    公开(公告)日:2004-08-31

    申请号:US10325425

    申请日:2002-12-19

    IPC分类号: G10L1306

    CPC分类号: G10L13/10 G10L13/04 G10L13/08

    摘要: A method and an apparatus for improved duration modeling of phonemes in a speech synthesis system are provided. According to one aspect, text is received into a processor of a speech synthesis system. The received text is processed using a sum-of-products phoneme duration model that is used in either the formant method or the concatenative method of speech generation. The phoneme duration model, which is used along with a phoneme pitch model, is produced by developing a non-exponential functional transformation form for use with a generalized additive model. The non-exponential functional transformation form comprises a root sinusoidal transformation that is controlled in response to a minimum phoneme duration and a maximum phoneme duration. The minimum and maximum phoneme durations are observed in training data. The received text is processed by specifying at least one of a number of contextual factors for the generalized additive model. An inverse of the non-exponential functional transformation is applied to duration observations, or training data. Coefficients are generated for use with the generalized additive model. The generalized additive model comprising the coefficients is applied to at least one phoneme of the received text resulting in the generation of at least one phoneme having a duration. An acoustic sequence is generated comprising speech signals that are representative of the received text.

    摘要翻译: 提供了一种用于在语音合成系统中改善音素的持续时间建模的方法和装置。 根据一个方面,文本被接收到语音合成系统的处理器中。 所收到的文本是使用产品总和音程持续时间模型来处理的,该模型用于共振峰方法或语音产生的并置方法。 与音素音调模型一起使用的音素持续时间模型通过开发用于广义加性模型的非指数函数变换形式来产生。 非指数函数变换形式包括根据最小音素持续时间和最大音素持续时间来控制的根正弦变换。 在训练数据中观察到最小和最大音素持续时间。 通过指定广义加法模型的多个上下文因素中的至少一个来处理接收到的文本。 非指数函数变换的逆向应用于持续时间观察或训练数据。 生成与广义加法模型一起使用的系数。 包括系数的广义加法模型被应用于接收到的文本的至少一个音素,导致产生具有持续时间的至少一个音素。 产生包括表示所接收文本的语音信号的声学序列。

    Method and apparatus for improved duration modeling of phonemes

    公开(公告)号:US06553344B2

    公开(公告)日:2003-04-22

    申请号:US10082438

    申请日:2002-02-22

    IPC分类号: G01L1306

    CPC分类号: G10L13/10 G10L13/04 G10L13/08

    摘要: A method and an apparatus for improved duration modeling of phonemes in a speech synthesis system are provided. According to one aspect, text is received into a processor of a speech synthesis system. The received text is processed using a sum-of-products phoneme duration model that is used in either the formant method or the concatenative method of speech generation. The phoneme duration model, which is used along with a phoneme pitch model, is produced by developing a non-exponential functional transformation form for use with a generalized additive model. The non-exponential functional transformation form comprises a root sinusoidal transformation that is controlled in response to a minimum phoneme duration and a maximum phoneme duration. The minimum and maximum phoneme durations are observed in training data. The received text is processed by specifying at least one of a number of contextual factors for the generalized additive model. An inverse of the non-exponential functional transformation is applied to duration observations, or training data. Coefficients are generated for use with the generalized additive model. The generalized additive model comprising the coefficients is applied to at least one phoneme of the received text resulting in the generation of at least one phoneme having a duration. An acoustic sequence is generated comprising speech signals that are representative of the received text.

    Methods and apparatuses for automatic speech recognition
    5.
    发明授权
    Methods and apparatuses for automatic speech recognition 有权
    自动语音识别的方法和装置

    公开(公告)号:US09431006B2

    公开(公告)日:2016-08-30

    申请号:US12497511

    申请日:2009-07-02

    摘要: Exemplary embodiments of methods and apparatuses for automatic speech recognition are described. First model parameters associated with a first representation of an input signal are generated. The first representation of the input signal is a discrete parameter representation. Second model parameters associated with a second representation of the input signal are generated. The second representation of the input signal includes a continuous parameter representation of residuals of the input signal. The first representation of the input signal includes discrete parameters representing first portions of the input signal. The second representation includes discrete parameters representing second portions of the input signal that are smaller than the first portions. Third model parameters are generated to couple the first representation of the input signal with the second representation of the input signal. The first representation and the second representation of the input signal are mapped into a vector space.

    摘要翻译: 描述用于自动语音识别的方法和装置的示例性实施例。 产生与输入信号的第一表示相关联的第一模型参数。 输入信号的第一个表示是离散参数表示。 产生与输入信号的第二表示相关联的第二模型参数。 输入信号的第二表示包括输入信号的残差的连续参数表示。 输入信号的第一表示包括表示输入信号的第一部分的离散参数。 第二表示包括表示输入信号的小于第一部分的第二部分的离散参数。 产生第三模型参数以将输入信号的第一表示与输入信号的第二表示耦合。 输入信号的第一表示和第二表示被映射到向量空间中。

    Combined statistical and rule-based part-of-speech tagging for text-to-speech synthesis
    6.
    发明授权
    Combined statistical and rule-based part-of-speech tagging for text-to-speech synthesis 失效
    用于文本到语音合成的组合统计和规则的词性标签

    公开(公告)号:US08719006B2

    公开(公告)日:2014-05-06

    申请号:US12870542

    申请日:2010-08-27

    IPC分类号: G06F17/27 G06F17/20 G06F17/21

    CPC分类号: G10L13/02 G10L13/10

    摘要: In response to a word of a text sequence, a first part-of-speech (POS) tag is generated using a statistical part-of-speech (POS) tagger based on a corpus of trained text sequences, each representing a likely POS of a word for a given text sequence. A second POS tag is generated using a rule-based POS tagger based on a set of one or more rules associated with a type of an application associated with the text sequence. A final POS tag is assigned to the word of the text sequence for TTS synthesis based on the first POS tag and the second POS tag.

    摘要翻译: 响应于文本序列的单词,使用基于经训练的文本序列的语料库的统计语音(POS)标签器来生成第一语音(POS)标签,每个表示可能的POS 给定文本序列的一个单词。 使用基于规则的POS标签器基于与与文本序列相关联的应用的类型相关联的一个或多个规则的集合来生成第二POS标签。 基于第一POS标签和第二POS标签,将最终的POS标签分配给用于TTS合成的文本序列的单词。

    Unsupervised document clustering using latent semantic density analysis
    7.
    发明授权
    Unsupervised document clustering using latent semantic density analysis 有权
    使用潜在语义密度分析的无监督文档聚类

    公开(公告)号:US08713021B2

    公开(公告)日:2014-04-29

    申请号:US12831909

    申请日:2010-07-07

    IPC分类号: G06F17/30

    摘要: According to one embodiment, a latent semantic mapping (LSM) space is generated from a collection of a plurality of documents, where the LSM space includes a plurality of document vectors, each representing one of the documents in the collection. For each of the document vectors considered as a centroid document vector, a group of document vectors is identified in the LSM space that are within a predetermined hypersphere diameter from the centroid document vector. As a result, multiple groups of document vectors are formed. The predetermined hypersphere diameter represents a predetermined closeness measure among the document vectors in the LSM space. Thereafter, a group from the plurality of groups is designated as a cluster of document vectors, where the designated group contains a maximum number of document vectors among the plurality of groups.

    摘要翻译: 根据一个实施例,从多个文档的集合生成潜在语义映射(LSM)空间,其中LSM空间包括多个文档向量,每个文档向量表示集合中的文档之一。 对于被认为是质心文档向量的每个文档向量,在LSM空间中识别出一组文档向量,其位于距重心文档向量的预定超球直径内。 结果,形成了多组文档向量。 预定的超球直径表示LSM空间中的文档向量中的预定的接近度量度。 此后,将来自多个组的组指定为文档向量的集合,其中指定组在多个组中包含最大数量的文档向量。

    Method for dynamic context scope selection in hybrid N-GRAM+LSA language modeling
    8.
    发明授权
    Method for dynamic context scope selection in hybrid N-GRAM+LSA language modeling 有权
    混合N-GRAM + LSA语言建模中动态上下文范围选择的方法

    公开(公告)号:US07720673B2

    公开(公告)日:2010-05-18

    申请号:US11710098

    申请日:2007-02-23

    IPC分类号: G06F17/20

    摘要: A method and system for dynamic language modeling of a document are described. In one embodiment, a number of local probabilities of a current document are computed and a vector representation of the current document in a latent semantic analysis (LSA) space is determined. In addition, a number of global probabilities based upon the vector representation of the current document in an LSA space is computed. Further, the local probabilities and the global probabilities are combined to produce the language modeling.

    摘要翻译: 描述了用于文档的动态语言建模的方法和系统。 在一个实施例中,计算当前文档的多个局部概率,并确定潜在语义分析(LSA)空间中当前文档的向量表示。 此外,计算出基于LSA空间中的当前文档的向量表示的多个全局概率。 此外,组合局部概率和全局概率以产生语言建模。

    Unsupervised data-driven pronunciation modeling
    9.
    发明授权
    Unsupervised data-driven pronunciation modeling 失效
    无监督的数据驱动的发音建模

    公开(公告)号:US07702509B2

    公开(公告)日:2010-04-20

    申请号:US11603586

    申请日:2006-11-21

    IPC分类号: G10L13/04

    CPC分类号: G10L15/187 G10L15/063

    摘要: Pronunciation for an input word is modeled by generating a set of candidate phoneme strings having pronunciations close to the input word in an orthographic space. Phoneme sub-strings in the set are selected as the pronunciation. In one aspect, a first closeness measure between phoneme strings for words chosen from a dictionary and contexts within the input word is used to determine the candidate phoneme strings. The words are chosen from the dictionary based on a second closeness measure between a representation of the input word in the orthographic space and orthographic anchors corresponding to the words in the dictionary. In another aspect, the phoneme sub-strings are selected by aligning the candidate phoneme strings on common phoneme sub-strings to produce an occurrence count, which is used to choose the phoneme sub-strings for the pronunciation.

    摘要翻译: 通过在正交空间中生成具有接近输入字的发音的候选音素串的集合来建模输入字的发音。 选择音色中的音素子串作为发音。 在一个方面,用于从字典中选择的词语的音素字符串和输入单词内的上下文之间的第一接近度量度用于确定候选音素字符串。 基于字典中的输入字的表示和对应于字典中的单词的正字拼图之间的第二接近度测量,从字典中选择词。 在另一方面,通过将候选音素串对准在公共音素子串上以产生一个出现次数来选择音素子串,该数目用于选择发音的音素子串。

    Method and apparatus for assigning word prominence to new or previous information in speech synthesis
    10.
    发明授权
    Method and apparatus for assigning word prominence to new or previous information in speech synthesis 有权
    将语音突出分配给语音合成中的新信息或先前信息的方法和装置

    公开(公告)号:US07313523B1

    公开(公告)日:2007-12-25

    申请号:US10439217

    申请日:2003-05-14

    IPC分类号: G10L13/04

    CPC分类号: G10L13/033 G10L13/04

    摘要: A method and apparatus is provided for generating speech that sounds more natural. In one embodiment, word prominence and latent semantic analysis are used to generate more natural sounding speech. A method for generating speech that sounds more natural may comprise generating synthesized speech having certain word prominence characteristics and applying a semantically-driven word prominence assignment model to specify word prominence consistent with the way humans assign word prominence. A speech representative of a current sentence is generated. The determination is made whether information in the current sentence is new or previously given in accordance with a semantic relationship between the current sentence and a number of preceding sentences. A word prominence is assigned to a word in the current sentence in accordance with the information determination.

    摘要翻译: 提供一种用于产生听起来更自然的语音的方法和装置。 在一个实施例中,词突出和潜在语义分析被用于产生更自然的声音语音。 用于产生听起来更自然的语音的方法可以包括产生具有某些字突出特征的合成语音,并且应用语义驱动的词突出分配模型来指定与人类分配字突出的方式一致的词突出。 生成当前句子的演讲代表。 确定当前句子中的信息是新的还是先前根据当前句子和多个先前句子之间的语义关系给出的确定。 根据信息确定,将当前句子中的单词分配给单词。