ADAPTING A COMPRESSED MODEL FOR USE IN SPEECH RECOGNITION
    61.
    发明申请
    ADAPTING A COMPRESSED MODEL FOR USE IN SPEECH RECOGNITION 有权
    适应用于语音识别的压缩模型

    公开(公告)号:US20100076757A1

    公开(公告)日:2010-03-25

    申请号:US12235748

    申请日:2008-09-23

    IPC分类号: G10L15/20

    CPC分类号: G10L15/20 G10L15/065

    摘要: A speech recognition system includes a receiver component that receives a distorted speech utterance. The speech recognition also includes an adaptor component that selectively adapts parameters of a compressed model used to recognize at least a portion of the distorted speech utterance, wherein the adaptor component selectively adapts the parameters of the compressed model based at least in part upon the received distorted speech utterance.

    摘要翻译: 语音识别系统包括接收失真的语音话语的接收机组件。 所述语音识别还包括适配器组件,所述适配器组件选择性地适配用于识别所述失真语音话语的至少一部分的压缩模型的参数,其中所述适配器组件至少部分地基于接收失真的语音话语选择性地调整所述压缩模型的参数 讲话话语。

    System and method for identifying semantic intent from acoustic information
    62.
    发明授权
    System and method for identifying semantic intent from acoustic information 有权
    用于从声学信息中识别语义意图的系统和方法

    公开(公告)号:US07634406B2

    公开(公告)日:2009-12-15

    申请号:US11009630

    申请日:2004-12-10

    IPC分类号: G10L15/06

    CPC分类号: G10L15/19 G10L15/1815

    摘要: In accordance with one embodiment of the present invention, unanticipated semantic intents are discovered in audio data in an unsupervised manner. For instance, the audio acoustics are clustered based on semantic intent and representative acoustics are chosen for each cluster. The human then need only listen to a small number of representative acoustics for each cluster (and possibly only one per cluster) in order to identify the unforeseen semantic intents.

    摘要翻译: 根据本发明的一个实施例,以无监督的方式在音频数据中发现意外的语义意图。 例如,音频声学基于语义意图进行聚类,并为每个群集选择代表性的声学。 然后,人们只需要听每个群集的少量代表性声学(并且可能只有一个群集),以便识别不可预见的语义意图。

    Acoustic models with structured hidden dynamics with integration over many possible hidden trajectories
    63.
    发明授权
    Acoustic models with structured hidden dynamics with integration over many possible hidden trajectories 有权
    具有结构化隐藏动力学的声学模型,并集成了许多可能的隐藏轨迹

    公开(公告)号:US07565284B2

    公开(公告)日:2009-07-21

    申请号:US11071904

    申请日:2005-03-01

    IPC分类号: G10L15/14

    CPC分类号: G10L15/02 G10L2015/025

    摘要: A method of producing at least one possible sequence of vocal tract resonance (VTR) for a fixed sequence of phonetic units, and producing the acoustic observation probability by integrating over such distributions is provided. The method includes identifying a sequence of target distributions for a VTR sequence corresponding to a phone sequence with a given segmentation. The sequence of target distributions is applied to a finite impulse response filter to produce distributions for possible VTR trajectories. Then these distributions are applied to a linearized nonlinear function to produce the acoustic observation probability for the given sequence of phonetic units. This acoustic observation probability is used for phonetic recognition.

    摘要翻译: 提供了一种用于固定语音单元序列的至少一个可能的声道共振(VTR)序列的方法,并且通过在这样的分布上积分来产生声学观察概率。 该方法包括识别对应于具有给定分割的电话序列的VTR序列的目标分布序列。 将目标分布的序列应用于有限脉冲响应滤波器,以产生可能的VTR轨迹的分布。 然后将这些分布应用于线性化非线性函数,以产生给定的语音单元序列的声学观察概率。 这种声学观测概率用于语音识别。

    Two-stage implementation for phonetic recognition using a bi-directional target-filtering model of speech coarticulation and reduction
    65.
    发明申请
    Two-stage implementation for phonetic recognition using a bi-directional target-filtering model of speech coarticulation and reduction 有权
    使用语音合成和还原的双向目标滤波模型进行语音识别的两阶段实现

    公开(公告)号:US20060200351A1

    公开(公告)日:2006-09-07

    申请号:US11069474

    申请日:2005-03-01

    IPC分类号: G10L15/04

    摘要: A structured generative model of a speech coarticulation and reduction is described with a novel two-stage implementation. At the first stage, the dynamics of formants or vocal tract resonance (VTR) are generated using prior information of resonance targets in the phone sequence. Bi-directional temporal filtering with finite impulse response (FIR) is applied to the segmental target sequence as the FIR filter's input. At the second stage the dynamics of speech cepstra are predicted analytically based on the FIR filtered VTR targets. The combined system of these two stages thus generates correlated and causally related VTR and cepstral dynamics where phonetic reduction is represented explicitly in the hidden resonance space and implicitly in the observed cepstral space. The combined system also gives the acoustic observation probability given a phone sequence. Using this probability, different phone sequences can be compared and ranked in terms of their respective probability values. This then permits the use of the model for phonetic recognition.

    摘要翻译: 用新的两阶段实现来描述语音合成和简化的结构化生成模型。 在第一阶段,使用电话序列中共振目标的先前信息产生共振峰或声道共振(VTR)的动力学。 具有有限脉冲响应(FIR)的双向时间滤波作为FIR滤波器的输入应用于分段目标序列。 在第二阶段,基于FIR滤波的VTR目标,分析地预测语音cepstra的动力学。 这两个阶段的组合系统因此产生相关和因果相关的VTR和倒谱动力学,其中语音减少在隐藏共振空间中明确表示,并且隐含地在观察到的倒频谱空间中。 组合系统还给出了电话序列的声学观察概率。 使用这种概率,可以根据它们各自的概率值对不同的电话序列进行比较和排序。 这样就允许使用模型进行语音识别。

    Interactive clustering method for identifying problems in speech applications
    66.
    发明申请
    Interactive clustering method for identifying problems in speech applications 有权
    用于识别语音应用中的问题的交互式聚类方法

    公开(公告)号:US20060178884A1

    公开(公告)日:2006-08-10

    申请号:US11054301

    申请日:2005-02-09

    IPC分类号: G10L15/06

    摘要: A method of aiding a speech recognition program developer by grouping calls passing through an identified question-answer (QA) state or transition into clusters based on causes of problems associated with the calls is provided. The method includes determining a number of clusters into which a plurality of calls will be grouped. Then, the plurality of calls is at least partially randomly assigned to the different clusters. Model parameters are estimated using clustering information based upon the assignment of the plurality of calls to the different clusters. Individual probabilities are calculated for each of the plurality of calls using the estimated model parameters. The individual probabilities are indicative of a likelihood that the corresponding call belongs to a particular cluster. The plurality of calls is then re-assigned to the different clusters based upon the calculated probabilities. These steps are then repeated until the grouping of the plurality of calls achieves a desired stability.

    摘要翻译: 提供了一种通过将通过识别的问答(QA)状态的呼叫或基于与呼叫相关联的问题的原因转换成群集的呼叫来帮助语音识别程序开发者的方法。 该方法包括确定将多个呼叫分组到的群集的数量。 然后,多个呼叫至少部分地被随机分配给不同的群集。 基于对不同簇的多个呼叫的分配,使用聚类信息估计模型参数。 使用估计的模型参数为多个呼叫中的每一个计算单个概率。 单个概率表示相应呼叫属于特定集群的可能性。 然后,基于所计算的概率,将多个呼叫重新分配给不同的群集。 然后重复这些步骤直到多个呼叫的分组达到期望的稳定性。

    Quantitative model for formant dynamics and contextually assimilated reduction in fluent speech

    公开(公告)号:US20060074676A1

    公开(公告)日:2006-04-06

    申请号:US10944262

    申请日:2004-09-17

    IPC分类号: G10L13/04

    CPC分类号: G10L13/02 G10L25/15

    摘要: A method of identifying a sequence of formant trajectory values is provided in which a sequence of target values are identified for a formant as step functions. The target values and the duration for each segment target for the formant are applied to a finite impulse response filter to form a sequence of formant trajectory values. The parameters of this filter, as well as the duration of the targets for each phone, can be modified to produce many kinds of target undershooting effects in a contextually assimilated manner. The procedure for producing the formant trajectory values does not require any acoustic data from speech.

    Parameter learning in a hidden trajectory model
    69.
    发明授权
    Parameter learning in a hidden trajectory model 有权
    隐藏轨迹模型中的参数学习

    公开(公告)号:US08942978B2

    公开(公告)日:2015-01-27

    申请号:US13182971

    申请日:2011-07-14

    IPC分类号: G10L15/00 G10L15/06

    CPC分类号: G10L15/063 G10L2015/025

    摘要: Parameters for distributions of a hidden trajectory model including means and variances are estimated using an acoustic likelihood function for observation vectors as an objection function for optimization. The estimation includes only acoustic data and not any intermediate estimate on hidden dynamic variables. Gradient ascent methods can be developed for optimizing the acoustic likelihood function.

    摘要翻译: 使用用于观察向量的声学似然函数作为优化的反对函数来估计包括装置和方差的隐藏轨迹模型的分布参数。 该估计仅包括声学数据,而不包括对隐藏的动态变量的任何中间估计。 可以开发梯度上升方法来优化声似然函数。

    Automatic speech recognition learning using categorization and selective incorporation of user-initiated corrections
    70.
    发明授权
    Automatic speech recognition learning using categorization and selective incorporation of user-initiated corrections 有权
    自动语音识别学习使用分类和选择性并入用户发起的更正

    公开(公告)号:US08280733B2

    公开(公告)日:2012-10-02

    申请号:US12884434

    申请日:2010-09-17

    摘要: An automatic speech recognition system recognizes user changes to dictated text and infers whether such changes result from the user changing his/her mind, or whether such changes are a result of a recognition error. If a recognition error is detected, the system uses the type of user correction to modify itself to reduce the chance that such recognition error will occur again. Accordingly, the system and methods provide for significant speech recognition learning with little or no additional user interaction.

    摘要翻译: 自动语音识别系统识别用户对规定文本的改变,并且推测这种改变是否由用户改变主意而产生,或者这些改变是否是识别错误的结果。 如果检测到识别错误,则系统使用用户校正的类型进行自身修改,以减少再次发生这种识别错误的可能性。 因此,该系统和方法提供了很少或没有额外的用户交互的重要语音识别学习。