Method for creating and using multiple-word sound models in speech
recognition
    1.
    发明授权
    Method for creating and using multiple-word sound models in speech recognition 失效
    在语音识别中创建和使用多个字的声音模型的方法

    公开(公告)号:US4837831A

    公开(公告)日:1989-06-06

    申请号:US919885

    申请日:1986-10-15

    IPC分类号: G10L15/06

    CPC分类号: G10L15/063

    摘要: A first speech recognition method receives an acoustic description of an utterance to be recognized and scores a portion of that description against each of a plurality of cluster models representing similar sounds from different words. The resulting score for each cluster is used to calculate a word score for each word represented by that cluster. Preferably these word scores are used to prefilter vocabulary words, and the description of the utterance includes a succession of acoustic decriptions which are compared by linear time alignment against a succession of acoustic models. A second speech recognition method is also provided which matches an acoustic model with each of a succession of acoustic descriptions of an utterance to be recognized. Each of these models has a probability score for each vocabulary word. The probability scores for each word associated with the matching acoustic models are combined to form a total score for that word. The preferred speech recognition method calculates to separate word scores for each currently active vocabulary word from a common succession of sounds. Preferably the first scores is calculated by a time alignment method, while the second score is calculated by a time independent method. Preferably this calculation of two separate word scores is used in one of multiple word-selecting phase of a recognition process, such as in the prefiltering phase.

    摘要翻译: 第一语音识别方法接收要识别的话语的声学描述,并且针对表示来自不同单词的类似声音的多个群集模型中的每一个分类该描述的一部分。 每个群集的结果得分用于计算由该群组表示的每个单词的单词得分。 优选地,这些单词分数用于预先滤除词汇单词,并且话语的描述包括通过线性时间对齐与一系列声学模型进行比较的一系列声学评论。 还提供了第二语音识别方法,其将声学模型与要被识别的话语的一系列声学描述中的每一个相匹配。 这些模型中的每一个都具有每个词汇单词的概率分数。 将与匹配的声学模型相关联的每个单词的概率分数组合以形成该单词的总分。 优选的语音识别方法计算用于将每个当前活跃的词汇单词的词分数从普通连续的声音中分离出来。 优选地,通过时间对准方法计算第一分数,而通过时间独立方法计算第二分数。 优选地,两个单独的单词分数的计算用于识别过程的多个字选择阶段之一,例如在预过滤阶段。

    Method for representing word models for use in speech recognition
    2.
    发明授权
    Method for representing word models for use in speech recognition 失效
    用于表示用于语音识别的单词模型的方法

    公开(公告)号:US4903305A

    公开(公告)日:1990-02-20

    申请号:US328738

    申请日:1989-03-23

    IPC分类号: G10L15/06 G10L15/14

    摘要: A method is provided for deriving acoustic word representations for use in speech recognition. Initial word models are created, each formed of a sequence of acoustic sub-models. The acoustic sub-models from a plurality of word models are clustered, so as to group acoustically similar sub-models from different words, using, for example, the Kullback-Leibler information as a metric of similarity. Then each word is represented by cluster spelling representing the clusters into which its acoustic sub-models were placed by the clustering. Speech recognition is performed by comparing sequences of frames from speech to be recognized against sequences of acoustic models associated with the clusters of the cluster spelling of individual word models. The invention also provides a method for deriving a word representation which involves receiving a first set of frame sequences for a word, using dynamic programming to derive a corresponding initial sequence of probabilistic acoustic sub-models for the word independently of any previously derived acoustic model particular to the word, using dynamic programming to time align each of a second set of frame sequences for the word into a succession of new sub-sequences corresponding to the initial sequence of models, and using these new sub-sequences to calculate new probabilistic sub-models.

    摘要翻译: 提供了一种用于导出用于语音识别的声学词表示的方法。 创建初始词模型,每个模型由一系列声学子模型组成。 来自多个单词模型的声学子模型被聚类,以便使用例如Kullback-Leibler信息作为相似度的度量来将来自不同单词的声学上相似的子模型分组。 然后,每个单词都是用聚类拼写表示的,表示聚类中其声学子模型放置的聚类。 通过将要识别的来自语音的帧的序列与与单个词模型的群集拼写的群集相关联的声学模型的序列进行比较来执行语音识别。 本发明还提供了一种用于导出单词表示的方法,该方法涉及用于接收单词的第一组帧序列,使用动态规划来导出独立于任何先前导出的任何声学模型特定的单词的概率声学子模型的对应的初始序列 使用动态规划来将该单词的第二组帧序列中的每一个时间对齐到与模型的初始序列相对应的一系列新子序列中,并且使用这些新的子序列来计算新的概率子序列, 楷模。

    Method for speech recognition
    3.
    发明授权
    Method for speech recognition 失效
    语音识别方法

    公开(公告)号:US4805219A

    公开(公告)日:1989-02-14

    申请号:US035628

    申请日:1987-04-03

    IPC分类号: G10L15/00 G10L5/00

    CPC分类号: G10L15/00

    摘要: A method determines if a portion of speech corresponds to a speech pattern by time aligning both the speech and a plurality of speech pattern models against a common time-aligning model. This compensates for speech variation between the speech and the pattern models. The method then compares the resulting time-aligned speech model against the resulting time-aligned pattern models to determine which of the patterns most probably corresponds to the speech. Preferably there are a plurality of time-aligning models, each representing a group of somewhat similar sound sequences which occur in different words. Each of these time-aligning models is scored for similarity against a portion of speech, and the time-aligned speech model and time-aligned pattern models produced by time alignment with the best scoring time-aligning model are compared to determine the likelihood that each speech pattern corresponds to the portion of speech. This is performed for each successive portion of speech. When a portion of speech appears to correspond to a given speech pattern model, a range of likely start times is calculated for the vocabulary word associated with that model, and a word score is calculated to indicate the likelihood of that word starting in that range. The method uses a more computationally intensive comparison between the speech and selected vocabulary words, so as to more accurately determine which words correspond with which portions of the speech. When this more intensive comparison indicates the ending of a word at a given point in the speech, the method selects the best scoring vocabulary words whose range of start times overlaps that ending time, and performs the computationally intensive comparison on those selected words starting at that point in the speech.

    摘要翻译: 方法通过相对于公共时间对准模型对语音和多个语音模式模型进行时间对准来确定语音的一部分是否对应于语音模式。 这补偿了语音与模式模式之间的语音变化。 然后,该方法将所得到的时间对齐语音模型与所得到的时间对齐模式模型进行比较,以确定哪些模式最可能对应于语音。 优选地,存在多个时间对准模型,每个时间对准模型表示以不同的单词发生的一些稍微相似的声音序列的组。 对这些时间对准模型中的每一个进行与一部分语音的相似性的评分,并且将与时间对齐产生的时间对齐语音模型和时间对齐模式模型与最佳评分时间对齐模型进行比较,以确定每个 语音模式对应于语音部分。 这是为每个连续的语音部分执行的。 当一部分语音似乎对应于给定的语音模式模型时,针对与该模型相关联的词汇单词计算可能的开始时间的范围,并且计算单词分数以指示在该范围内开始该单词的可能性。 该方法在语音和所选择的词汇单词之间使用更加计算密集的比较,以便更准确地确定哪些词对应于语音的哪个部分。 当这种更加密集的比较表明语音中给定点处的单词的结束时,该方法选择开始时间范围与该结束时间重叠的最佳得分词汇单词,并且对那些从那开始的那些选择的单词执行计算密集比较 在演讲中指出。

    Method for speech analysis and speech recognition
    4.
    发明授权
    Method for speech analysis and speech recognition 失效
    语音分析和语音识别方法

    公开(公告)号:US4805218A

    公开(公告)日:1989-02-14

    申请号:US34842

    申请日:1987-04-03

    IPC分类号: G10L15/00 G10L1/00

    CPC分类号: G10L15/00

    摘要: A method of speech analysis calculates one or more difference parameters for each of a sequence of acoustic frames, where each difference parameter is a function of the difference between an acoustic parameter in one frame and an acoustic parameter in a nearby frame. The method is used in speech recognition which compares the difference parameters of each frame against acoustic models representing speech units, where each speech-unit model has a model of the difference parameters associated with the frames of its speech unit. The difference parameters can be slope parameters or energy difference parameters. Slope parameters are derived by finding the difference between the energy of a given spectral parameter of a given frame and the energy, in a nearby frame, of a spectral parameter associated with a different frequency band. The resulting parameter indicates the extent to which the frequency of energy in the part of the spectrum represented by the given parameter is going up or going down. Energy difference parameters are calculated as a function of the difference between a given spectral parameter in one frame and a spectral parameter in a nearby frame representing the same frequency band. In one embodiment of the invention, dynamic programming compares the difference parameters of a sequence of frames to be recognized against a sequence of dynamic programming elements associated with each of a plurality of speech-unit models. In another embodiment of the invention, each speech-unit model represents one phoneme, and the speech-unit models for a plurality of phonemes are compared against individual frames, to associate with each such frame the one or more phonemes whose models compare most closely with it.

    摘要翻译: 语音分析的方法针对每个声学帧序列计算一个或多个差分参数,其中每个差分参数是一个帧中的声学参数与附近帧中的声学参数之间的差的函数。 该方法用于语音识别,其将每个帧的差分参数与表示语音单元的声学模型进行比较,其中每个语音单元模型具有与其语音单元的帧相关联的差异参数的模型。 差分参数可以是斜率参数或能量差参数。 通过找到给定帧的给定频谱参数的能量与附近帧中与不同频带相关联的频谱参数的能量之间的差异来导出斜率参数。 所得到的参数表示由给定参数表示的频谱部分中的能量频率正在上升或下降的程度。 根据一帧中给定的频谱参数与表示相同频段的附近帧中的频谱参数之间的差值,计算能量差参数。 在本发明的一个实施例中,动态规划将要识别的帧序列的差参数与与多个语音单元模型中的每一个相关联的动态编程元件的序列进行比较。 在本发明的另一实施例中,每个语音单元模型表示一个音素,并且将多个音素的语音单元模型与各个帧进行比较,以将每个这样的帧与每个这样的帧相关联,其一个或多个音素的模型与 它。

    Methods and apparatus for replaceable customization of multimodal embedded interfaces
    5.
    发明申请
    Methods and apparatus for replaceable customization of multimodal embedded interfaces 审中-公开
    多模式嵌入式接口可替换定制的方法和装置

    公开(公告)号:US20050203729A1

    公开(公告)日:2005-09-15

    申请号:US11058407

    申请日:2005-02-15

    IPC分类号: G06F17/28 H04M1/725

    CPC分类号: H04M1/72563

    摘要: According to certain aspects of the invention a mobile voice communication device includes a wireless transceiver circuit for transmitting and receiving auditory information and data, a processor, and a memory storing executable instructions which when executed on the processor causes the mobile voice communication device to provide a selectable personality associated with a user interface to a user of the mobile voice communication device. The executable instructions include implementing on the device a user interface that employs the different user prompts having the selectable personality, wherein each selectable personality of the different user prompts is defined and mapped to data stored in at least one database in the mobile voice communication device. The mobile voice communication device may include a decoder that recognizes a spoken user input and provides a corresponding recognized word, and a speech synthesizer that synthesizes a word corresponding to the recognized word. The device includes user-selectable personalities that are either transmitted wirelessly to the device, transmitted through a computer interface, or provided as memory cards to the device.

    摘要翻译: 根据本发明的某些方面,移动语音通信设备包括用于发送和接收听觉信息和数据的无线收发器电路,处理器和存储可执行指令的存储器,当在处理器上执行时,移动语音通信设备提供 与移动语音通信设备的用户的用户界面相关联的可选个性。 可执行指令包括在设备上实施采用具有可选择个性的不同用户提示的用户界面,其中将不同用户提示的每个可选择个性定义并映射到存储在移动语音通信设备中的至少一个数据库中的数据。 移动语音通信设备可以包括识别口头用户输入并提供对应的识别的单词的解码器,以及合成对应于所识别的单词的单词的语音合成器。 该设备包括用户可选择的个性,其被无线地传输到设备,通过计算机接口传输,或作为存储卡提供给设备。

    Method of producing alternate utterance hypotheses using auxiliary information on close competitors
    6.
    发明申请
    Method of producing alternate utterance hypotheses using auxiliary information on close competitors 有权
    使用辅助信息在密切的竞争对手上产生替代发音假设的方法

    公开(公告)号:US20050108012A1

    公开(公告)日:2005-05-19

    申请号:US10783518

    申请日:2004-02-20

    摘要: A method of constructing a list of alternate transcripts from a recognized transcript includes generating a list of close call records, matching partial sub-histories from the recognized transcript with one of the history pairs stored in each of the records, and substituting the other of the history pairs for the partial sub-history of the recognized transcript. A close call record is generated each time a pair of partial hypotheses attempt to seed a common word. Each close call record includes history information and scoring information associated with a particular pair of partial hypotheses seeding a common word. Alternate transcripts are constructed by substituting close call histories for partial histories of the recognized transcripts, and also by substituting close call histories for partial histories of other alternate transcript.

    摘要翻译: 从识别的记录中构建候选抄本的列表的方法包括生成紧密呼叫记录的列表,将来自所识别抄本的部分子历史与存储在每个记录中的历史对之一进行匹配, 历史对对于识别的成绩单的部分子历史记录。 每当一对部分假设尝试种植一个共同词时,就会产生一个接近通话记录。 每个近距离通话记录包括历史信息和与特定的一对部分假设相关联的评分信息,播种公共字。 替代的成绩单是通过将认可的记录的部分历史代替关闭呼叫历史,并通过替代其他替代记录的部分历史的近距离呼叫历史来代替。

    Methods and apparatus for formant-based voice systems
    7.
    发明授权
    Methods and apparatus for formant-based voice systems 有权
    基于共振峰的语音系统的方法和装置

    公开(公告)号:US08447592B2

    公开(公告)日:2013-05-21

    申请号:US11225524

    申请日:2005-09-13

    IPC分类号: G10L11/04

    摘要: In one aspect, a method of processing a voice signal to extract information to facilitate training a speech synthesis model is provided. The method comprises acts of detecting a plurality of candidate features in the voice signal, performing at least one comparison between one or more combinations of the plurality of candidate features and the voice signal, and selecting a set of features from the plurality of candidate features based, at least in part, on the at least one comparison. In another aspect, the method is performed by executing a program encoded on a computer readable medium. In another aspect, a speech synthesis model is provided by, at least in part, performing the method.

    摘要翻译: 在一个方面,提供一种处理语音信号以提取信息以便于训练语音合成模型的方法。 该方法包括检测语音信号中的多个候选特征的动作,执行多个候选特征的一个或多个组合与语音信号之间的至少一个比较,以及从多个候选特征中选择一组特征 ,至少部分地在至少一个比较上。 在另一方面,通过执行在计算机可读介质上编码的程序来执行该方法。 在另一方面,通过至少部分地执行该方法来提供语音合成模型。

    Pronunciation discovery for spoken words
    9.
    发明申请
    Pronunciation discovery for spoken words 有权
    口语发音发现

    公开(公告)号:US20050143970A1

    公开(公告)日:2005-06-30

    申请号:US10939942

    申请日:2004-09-13

    IPC分类号: G10L15/18 G06F17/28

    摘要: A method of generating an alternative pronunciation for a word or phrase, given an initial pronunciation and a spoken example of the word or phrase, includes providing the initial pronunciation of the word or phrase, and generating the alternative pronunciation by searching a neighborhood of pronunciations about the initial pronunciation via a constrained hypothesis, wherein the neighborhood includes pronunciations that differ from the initial pronunciation by at most one phoneme. The method further includes selecting a highest scoring pronunciation within the neighborhood of pronunciations.

    摘要翻译: 给定一个单词或短语的替代发音的方法,给定一个单词或短语的初始发音和口语例子,包括提供单词或短语的初始发音,并通过搜索关于发音的邻域发生替代发音 通过约束假设的初始发音,其中所述邻域包括与最初一个音素的初始发音不同的发音。 该方法还包括在发音附近选择最高的评分发音。

    Method and apparatus for back-up of customized application information
    10.
    发明申请
    Method and apparatus for back-up of customized application information 审中-公开
    用于备份定制应用程序信息的方法和装置

    公开(公告)号:US20050164692A1

    公开(公告)日:2005-07-28

    申请号:US10936882

    申请日:2004-09-09

    摘要: A method of operating a mobile communication device having a set of one or more applications, each with its own associated user-configurable customization, the method comprising detecting whether the user-configurable customization of any of the applications has changed since an earlier time, and for all applications for which the user-configurable customization has changed since said earlier time, wirelessly transmitting those changes to a remote server. The method further comprises maintaining a set of flags indicating whether changes have occurred to the user-configurable customization, wherein detecting whether the user-configurable customization of any of the applications has changed since said earlier time includes reading the set of flags. The remote server is one of a carrier server and a third party provider server.

    摘要翻译: 一种操作移动通信设备的方法,所述移动通信设备具有一组一个或多个应用,每个应用具有其自己的相关联的用户可配置的定制,所述方法包括检测任何应用的用户可配置的定制是否自从更早的时间以来已经改变,以及 对于用户可配置的自定义从上次更改以来已经更改的所有应用程序,无线地将这些更改发送到远程服务器。 该方法还包括维护一组标志,指示用户可配置的定制是否已经发生变化,其中检测任何应用的用户可配置的定制是否已经从先前的时间改变包括读取该组标志。 远程服务器是运营商服务器和第三方提供商服务器之一。