METHODS AND COMPUTER PROGRAM PRODUCTS FOR PROVIDING PARAPHRASING IN A TEXT-TO-SPEECH SYSTEM
    1.
    发明申请
    METHODS AND COMPUTER PROGRAM PRODUCTS FOR PROVIDING PARAPHRASING IN A TEXT-TO-SPEECH SYSTEM 审中-公开
    方法和计算机程序产品,用于在文本到语音系统中提供分隔符

    公开(公告)号:US20080167876A1

    公开(公告)日:2008-07-10

    申请号:US11619682

    申请日:2007-01-04

    IPC分类号: G10L21/06

    摘要: A method and computer program product for providing paraphrasing in a text-to-speech (TTS) system is provided. The method includes receiving an input text, parsing the input text, and determining a paraphrase of the input text. The method also includes synthesizing the paraphrase into synthesized speech. The method further includes selecting synthesized speech to output, which includes: assigning a score to each synthesized speech associated with each paraphrase, comparing the score of each synthesized speech associated with each paraphrase, and selecting the top-scoring synthesized speech to output. Furthermore, the method includes outputting the selected synthesized speech.

    摘要翻译: 提供了一种用于在文本到语音(TTS)系统中提供释义的方法和计算机程序产品。 该方法包括接收输入文本,解析输入文本以及确定输入文本的释义。 该方法还包括将释义合成为合成语音。 该方法还包括选择合成语音以输出,其包括:将分数分配给与每个释义相关联的每个合成语音,比较与每个释义相关联的每个合成语音的得分,以及选择最高得分合成语音以输出。 此外,该方法包括输出所选择的合成语音。

    Speech coding apparatus and method for generating acoustic feature
vector component values by combining values of the same features for
multiple time intervals
    2.
    发明授权
    Speech coding apparatus and method for generating acoustic feature vector component values by combining values of the same features for multiple time intervals 失效
    用于通过组合多个时间间隔的相同特征的值来生成声学特征矢量分量值的语音编码装置和方法

    公开(公告)号:US5544277A

    公开(公告)日:1996-08-06

    申请号:US98682

    申请日:1993-07-28

    CPC分类号: G10L15/02 G10L15/20

    摘要: A speech coding apparatus and method measures the values of at least first and second different features of an utterance during each of a series of successive time intervals. For each time interval, a feature vector signal has a first component value equal to a first weighted combination of the values of only one feature of the utterance for at least two time intervals. The feature vector signal has a second component value equal to a second weighted combination, different from the first weighted combination, of the values of only one feature of the utterance for at least two time intervals. The resulting feature vector signals for a series of successive time intervals form a coded representation of the utterance. In one embodiment, a first weighted mixture signal has a value equal to a first weighted mixture of the values of the features of the utterance during a single time interval. A second weighted mixture signal has a value equal to a second weighted mixture, different from the first weighted mixture, of the values of the features of the utterance during a single time interval. The first component value of each feature vector signal is equal to a first weighted combination of the values of only the first weighted mixture signals for at least two time intervals, and the second component value of each feature vector signal is equal to a second weighted combination, different from the first weighted combination, of the values of only the second weighted mixture for at least two time intervals.

    摘要翻译: 语音编码装置和方法在一系列连续时间间隔的每一个期间测量话音的至少第一和第二不同特征的值。 对于每个时间间隔,特征向量信号具有等于至少两个时间间隔的仅一个特征的值的第一加权组合的第一分量值。 特征向量信号具有等于至少两个时间间隔的话语的一个特征的值的等于第一加权组合的第二加权组合的第二分量值。 所得到的一系列连续时间间隔的特征矢量信号形成话音的编码表示。 在一个实施例中,第一加权混合信号具有等于在单个时间间隔期间话音特征值的第一加权混合的值。 第二加权混合信号具有等于在单个时间间隔期间话音特征的值的与第一加权混合不同的第二加权混合的值。 每个特征向量信号的第一分量值等于至少两个时间间隔的仅第一加权混合信号的值的第一加权组合,并且每个特征向量信号的第二分量值等于第二加权组合 与第一加权组合不同的是仅至少两个时间间隔的第二加权混合值的值。

    Method, apparatus and computer program providing a multi-speaker database for concatenative text-to-speech synthesis
    4.
    发明授权
    Method, apparatus and computer program providing a multi-speaker database for concatenative text-to-speech synthesis 有权
    方法,装置和计算机程序提供用于并行文本到语音合成的多扬声器数据库

    公开(公告)号:US07716052B2

    公开(公告)日:2010-05-11

    申请号:US11101223

    申请日:2005-04-07

    IPC分类号: G10L13/00 G10L13/08 G10L13/06

    CPC分类号: G10L13/07 G10L2021/0135

    摘要: A method, apparatus and a computer program product to generate an audible speech word that corresponds to text. The method includes providing a text word and, in response to the text word, processing pre-recorded speech segments that are derived from a plurality of speakers to selectively concatenate together speech segments based on at least one cost function to form audio data for generating an audible speech word that corresponds to the text word. A data structure is also provided for use in a concatenative text-to-speech system that includes a plurality of speech segments derived from a plurality of speakers, where each speech segment includes an associated attribute vector each of which is comprised of at least one attribute vector element that identifies the speaker from which the speech segment was derived.

    摘要翻译: 一种用于生成对应于文本的可听话语词的方法,装置和计算机程序产品。 该方法包括提供文本字,并且响应于文本字,处理从多个扬声器导出的预先记录的语音片段,以便基于至少一个成本函数选择性地将语音片段并置在一起,以形成用于生成 对应于文本字的声音语音字。 还提供了一种数据结构,用于包括从多个扬声器导出的多个语音段的级联文本到语音系统,其中每个语音段包括相关联的属性向量,每个语音段包括至少一个属性 标识从中导出语音段的扬声器的向量元素。

    SYSTEM AND METHOD FOR DYNAMICALLY SELECTING AMONG TTS SYSTEMS
    5.
    发明申请
    SYSTEM AND METHOD FOR DYNAMICALLY SELECTING AMONG TTS SYSTEMS 有权
    用于动态选择TTS系统的系统和方法

    公开(公告)号:US20080172234A1

    公开(公告)日:2008-07-17

    申请号:US11622683

    申请日:2007-01-12

    IPC分类号: G10L13/02

    CPC分类号: G10L13/047

    摘要: Systems and methods for dynamically selecting among text-to-speech (TTS) systems. Exemplary embodiments of the systems and methods include identifying text for converting into a speech waveform, synthesizing said text by three TTS systems, generating a candidate waveform from each of the three systems, generating a score from each of the three systems, comparing each of the three scores, selecting a score based on a criteria and selecting one of the three waveforms based on the selected of the three scores.

    摘要翻译: 在文本到语音(TTS)系统中动态选择的系统和方法。 系统和方法的示例性实施例包括识别用于转换成语音波形的文本,通过三个TTS系统合成所述文本,从三个系统中的每一个生成候选波形,从三个系统中的每个系统生成得分, 三个分数,基于标准选择分数,并且基于所选择的三个分数选择三个波形中的一个。

    System and method for dynamically selecting among TTS systems
    6.
    发明授权
    System and method for dynamically selecting among TTS systems 有权
    在TTS系统之间进行动态选择的系统和方法

    公开(公告)号:US07702510B2

    公开(公告)日:2010-04-20

    申请号:US11622683

    申请日:2007-01-12

    IPC分类号: G10L13/08 G10L13/00

    CPC分类号: G10L13/047

    摘要: Systems and methods for dynamically selecting among text-to-speech (TTS) systems. Exemplary embodiments of the systems and methods include identifying text for converting into a speech waveform, synthesizing said text by three TTS systems, generating a candidate waveform from each of the three systems, generating a score from each of the three systems, comparing each of the three scores, selecting a score based on a criteria and selecting one of the three waveforms based on the selected of the three scores.

    摘要翻译: 在文本到语音(TTS)系统中动态选择的系统和方法。 系统和方法的示例性实施例包括识别用于转换成语音波形的文本,通过三个TTS系统合成所述文本,从三个系统中的每一个生成候选波形,从三个系统中的每个系统生成得分, 三个分数,基于标准选择分数,并且基于所选择的三个分数选择三个波形中的一个。

    Speech recognizer having a speech coder for an acoustic match based on
context-dependent speech-transition acoustic models
    7.
    发明授权
    Speech recognizer having a speech coder for an acoustic match based on context-dependent speech-transition acoustic models 失效
    语音识别器具有基于上下文相关语音 - 过渡声学模型的用于声学匹配的语音编码器

    公开(公告)号:US5333236A

    公开(公告)日:1994-07-26

    申请号:US942862

    申请日:1992-09-10

    CPC分类号: G10L19/06

    摘要: A speech coding apparatus compares the closeness of the feature value of a feature vector signal of an utterance to the parameter values of prototype vector signals to obtain prototype match scores for the feature vector signal and each prototype vector signal. The speech coding apparatus stores a plurality of speech transition models representing speech transitions. At least one speech transition is represented by a plurality of different models. Each speech transition model has a plurality of model outputs, each comprising a prototype match score for a prototype vector signal. Each model output has an output probability. A model match score for a first feature vector signal and each speech transition model comprises the output probability for at least one prototype match score for the first feature vector signal and a prototype vector signal. A speech transition match score for the first feature vector signal and each speech transition comprises the best model match score for the first feature vector signal and all speech transition models representing the speech transition. The identification value of each speech transition and the speech transition match score for the first feature vector signal and each speech transition are output as a coded utterance representation signal of the first feature vector signal.

    摘要翻译: 语音编码装置将发声特征矢量信号的特征值与原型矢量信号的参数值的接近度进行比较,以获得特征向量信号和每个原型矢量信号的原型匹配分数。 语音编码装置存储表示语音转换的多个语音转换模型。 至少一个语音转换由多个不同的模型表示。 每个语音转换模型具有多个模型输出,每个模型输出包括原型矢量信号的原型匹配分数。 每个模型输出具有输出概率。 用于第一特征向量信号和每个语音转换模型的模型匹配分数包括用于第一特征向量信号和原型矢量信号的至少一个原型匹配分数的输出概率。 用于第一特征向量信号和每个语音转换的语音转换匹配分数包括用于第一特征向量信号的最佳模型匹配分数和表示语音转换的所有语音转换模型。 输出第一特征矢量信号和每个语音转换的每个语音转换的识别值和语音转换匹配分数作为第一特征向量信号的编码话音表示信号。

    Speech coding apparatus having speaker dependent prototypes generated
from nonuser reference data
    8.
    发明授权
    Speech coding apparatus having speaker dependent prototypes generated from nonuser reference data 失效
    具有由非用户参考数据生成的具有说话者依赖原型的语音编码装置

    公开(公告)号:US5278942A

    公开(公告)日:1994-01-11

    申请号:US802678

    申请日:1991-12-05

    CPC分类号: G10L15/063 G10L15/02

    摘要: A speech coding apparatus and method for use in a speech recognition apparatus and method. The value of at least one feature of an utterance is measured during each of a series of successive time intervals to produce a series of feature vector signals representing the feature values. A plurality of prototype vector signals, each having at least one parameter value and a unique identification value are stored. The closeness of the feature vector signal is compared to the parameter values of the prototype vector signals to obtain prototype match scores for the feature value signal and each prototype vector signal. The identification value of the prototype vector signal having the best prototype match score is output as a coded representation signal of the feature vector signal. Speaker-dependent prototype vector signals are generated from both synthesized training vector signals and measured training vector signals. The synthesized training vector signals are transformed reference feature vector signals representing the values of features of one or more utterances of one or more speakers in a reference set of speakers. The measured training feature vector signals represent the values of features of one or more utterances of a new speaker/user not in the reference set.

    摘要翻译: 一种用于语音识别装置和方法的语音编码装置和方法。 在一系列连续时间间隔的每一个期间测量话音的至少一个特征的值,以产生表示特征值的一系列特征向量信号。 存储多个具有至少一个参数值和唯一识别值的原型矢量信号。 将特征矢量信号的接近度与原型矢量信号的参数值进行比较,以获得特征值信号和每个原型矢量信号的原型匹配分数。 输出具有最佳原型匹配分数的原型矢量信号的识别值作为特征矢量信号的编码表示信号。 从合成的训练矢量信号和测量的训练矢量信号产生与扬声器相关的原型矢量信号。 合成的训练矢量信号是变换的参考特征矢量信号,其代表参考的一组扬声器中的一个或多个扬声器的一个或多个话音的特征值。 测量的训练特征向量信号表示不在参考集合中的新的说话者/用户的一个或多个话语的特征值。

    Constructing Markov models of words from multiple utterances
    9.
    发明授权
    Constructing Markov models of words from multiple utterances 失效
    从多个话语构建马可夫模型

    公开(公告)号:US4759068A

    公开(公告)日:1988-07-19

    申请号:US738933

    申请日:1985-05-29

    IPC分类号: G10L15/14 G10L5/00

    CPC分类号: G10L15/14

    摘要: Speech recognition is improved by splitting each feneme string at a consistent point into a left portion and a right portion. The present invention addresses the problem of constructing fenemic baseforms which take into account variations in pronunciation of words from one utterance thereof to another. Specifically, the invention relates to a method of constructing a fenemic baseform for a word in a vocabulary of word segments including the steps of: (a) transforming multiple utterances of the word into respective strings of fenemes; (b) defining a set of fenemic Markov model phone machines; (c) determining the best single phone machine P.sub.1 for producing the multiple feneme strings; (d) determining the best two phone baseform of the form P.sub.1 P.sub.2 or P.sub.2 P.sub.1 for producing the multiple feneme strings; (e) aligning the best two phone baseform against each feneme string; (f) splitting each feneme string into a left portion and a right portion with the left portion corresponding to the first phone machine of the two phone baseform and the right portion corresponding to the second phone machine of the two phone baseform; (g) identifying each left portion as a left substring and each right portion as a right substring; (h) processing the set of left substrings and the set of right substrings in the same manner as the set of feneme strings corresponding to the multiple utterances including the further step of inhibiting further splitting of a substring when the single phone baseform thereof has a higher probability of producing the substring than does the best two phone baseform; and (k) concatenating the unsplit single phones in an order corresponding to the order of the feneme substrings to which they correspond.

    摘要翻译: 通过将一致点处的每个非对称串分成左部分和右部分来改善语音识别。 本发明解决了考虑到从一个发音到另一个发音的词的发音的变化的构建快速基本形式的问题。 具体地说,本发明涉及一种在词段词汇中构建单词的构象基础形式的方法,包括以下步骤:(a)将单词的多个话语转换成各自的拼写字符串; (b)定义一套美式马尔可夫模型电话机; (c)确定最好的单机P1用于产生多个无线串; (d)确定形式为P1P2或P2P1的最佳两个手机基本形式,用于产生多个拼接线; (e)将最佳的两个手机基本格局对准每个拼音字符串; (f)将每个拼音串分成左侧部分和右侧部分,左侧部分对应于两个电话基本形式的第一电话机和对应于两个电话基本形式的第二电话机的右部分; (g)将每个左部分识别为左子串,每个右部分作为右子串; (h)以与多重话语对应的组合字符串的集合相同的方式处理左子串和右子集的集合,包括当单个电话基础具有更高的子字符串时进一步分割子串的另一步骤 生成子串的概率比最好的两个手机基本形式; 和(k)以对应于它们对应的无限子串的顺序的顺序连接非分开的单个电话。

    Methods and apparatus for conversational name dialing systems
    10.
    发明授权
    Methods and apparatus for conversational name dialing systems 有权
    会话名称拨号系统的方法和装置

    公开(公告)号:US06925154B2

    公开(公告)日:2005-08-02

    申请号:US10139255

    申请日:2002-05-03

    摘要: Techniques for providing an automated conversational name dialing system for placing a call in response to an input by a user. One technique begins with the step of analyzing an input from a user, wherein the input includes information directed to identifying an intended recipient of a telephone call from the user. At least one candidate for the intended recipient is identified in response to the input, wherein the at least one candidate represents at least one potential match between the intended recipient and a predetermined vocabulary. A confidence measure indicative of a likelihood that the at least one candidate is the intended recipient is determined, and additional information is obtained from the user to increase the likelihood that the at least one candidate is the intended recipient, based on the determined confidence measure.

    摘要翻译: 用于提供自动对话名称拨号系统的技术,用于响应于用户的输入发出呼叫。 一种技术从分析来自用户的输入的步骤开始,其中输入包括用于从用户识别电话呼叫的预期接收者的信息。 响应于输入识别预期接收者的至少一个候选者,其中所述至少一个候选者表示预期接收者和预定词汇之间的至少一个潜在匹配。 确定指示至少一个候选者是预期接收者的可能性的置信度量度,并且基于所确定的置信度量度,从用户获得附加信息以增加至少一个候选者是预期接收者的可能性。