Adaptive probabilistic query expansion
    1.
    发明授权
    Adaptive probabilistic query expansion 有权
    自适应概率查询扩展

    公开(公告)号:US07437349B2

    公开(公告)日:2008-10-14

    申请号:US10143146

    申请日:2002-05-10

    Abstract: A method, system and computer program for adaptively processing a query search. An expanding operation is utilized to expand the query into sub-queries, wherein at least one of the sub-queries is expanded probabilistically. A retrieving operation retrieves the results of the sub-queries, and a merging operation is used to merge the sub-query results into a search result. An adapting operation is configured to modify the search such that the relevance of the search result is increased when the search is repeated.

    Abstract translation: 一种用于自适应处理查询搜索的方法,系统和计算机程序。 使用扩展操作来将查询扩展到子查询中,其中至少一个子查询被概率地扩展。 检索操作检索子查询的结果,并且使用合并操作将子查询结果合并到搜索结果中。 配置操作被配置为修改搜索,使得当重复搜索时增加搜索结果的相关性。

    Late integration in audio-visual continuous speech recognition
    2.
    发明授权
    Late integration in audio-visual continuous speech recognition 有权
    视听连续语音识别的后期整合

    公开(公告)号:US06633844B1

    公开(公告)日:2003-10-14

    申请号:US09452919

    申请日:1999-12-02

    CPC classification number: G10L15/25

    Abstract: The combination of audio and video speech recognition in a manner to improve the robustness of speech recognition systems in noisy environments. Contemplated are methods and apparatus in which a video signal associated with a video source and an audio signal associated with the video signal are processed, the most likely viseme associated with the audio signal and video signal is determined and, thereafter, the most likely phoneme associated with the audio signal and video signal is determined.

    Abstract translation: 音频和视频语音识别的组合以提高语音识别系统在嘈杂环境中的鲁棒性的方式。 考虑到其中处理与视频源相关联的视频信号和与视频信号相关联的音频信号的方法和装置,确定与音频信号和视频信号相关联的最可能的视觉,并且之后,最可能的音素相关联 音频信号和视频信号被确定。

    Nongaussian density estimation for the classification of acoustic feature vectors in speech recognition
    3.
    发明授权
    Nongaussian density estimation for the classification of acoustic feature vectors in speech recognition 失效
    用于语音识别中声学特征向量分类的非整数密度估计

    公开(公告)号:US06269334B1

    公开(公告)日:2001-07-31

    申请号:US09104553

    申请日:1998-06-25

    CPC classification number: G10L15/02

    Abstract: A statistical modeling paradigm for automatic machine recognition of speech uses mixtures of nongaussion statistical probability densities which provides improved recognition accuracy. Speech is modeled by building probability densities from functions of the form exp(−t&agr;/2) for t≧0 and &agr;>0. Mixture components are constructed from different univariate functions. The mixture model is used in a maximum likelihood model of speech data.

    Abstract translation: 用于语音自动机器识别的统计建模范例使用非功能统计概率密度的混合,提高了识别精度。 通过从t> = 0和alpha> 0的形式exp(-talpha / 2)的函数构建概率密度来建模语音。 混合物组分由不同的单变量功能构成。 混合模型用于语音数据的最大似然模型。

    Wavelet-based energy binning cepstal features for automatic speech recognition
    5.
    发明授权
    Wavelet-based energy binning cepstal features for automatic speech recognition 有权
    用于自动语音识别的基于小波的能量收集cepstal特征

    公开(公告)号:US06253175B1

    公开(公告)日:2001-06-26

    申请号:US09201055

    申请日:1998-11-30

    CPC classification number: G10L15/02 G10L25/27 G10L2015/0631

    Abstract: Systems and methods for processing acoustic speech signals which utilize the wavelet transform (and alternatively, the Fourier transform) as a fundamental tool. The method essentially involves “synchrosqueezing” spectral component data obtained by performing a wavelet transform (or Fourier transform) on digitized speech signals. In one aspect, spectral components of the synchrosqueezed plane are dynamically tracked via a K-means clustering algorithm. The amplitude, frequency and bandwidth of each of the components are, thus, extracted. The cepstrum generated from this information is referred to as “K-mean Wastrum.” In another aspect, the result of the K-mean clustering process is further processed to limit the set of primary components to formants. The resulting features are referred to as “formant-based wastrum.” Formants are interpolated in unvoiced regions and the contribution of unvoiced turbulent part of the spectrum are added. This method requires adequate formant tracking. The resulting robust formant extraction has a number of applications in speech processing and analysis including vocal tract normalization.

    Abstract translation: 用于处理利用小波变换(或者替代地,傅立叶变换)作为基本工具的声学语音信号的系统和方法。 该方法基本上涉及通过对数字化语音信号执行小波变换(或傅立叶变换)而获得的“同步挤压”频谱分量数据。 在一个方面,同步挤压平面的频谱分量通过K均值聚类算法动态跟踪。 因此,提取每个组件的振幅,频率和带宽。 从该信息产生的倒谱被称为“K均值膜”。 在另一方面,进一步处理K均值聚类过程的结果以将主要组分的集合限制为共轭体。 所得到的特征被称为“基于共振峰的水泥”。 在无声区域中插入共聚体,并加入谱的无声湍流部分的贡献。 这种方法需要足够的共振峰跟踪。 所得到的鲁棒共振峰提取在语音处理和分析中具有许多应用,包括声道归一化。

    Methods and apparatus for audio-visual speaker recognition and utterance verification
    6.
    发明授权
    Methods and apparatus for audio-visual speaker recognition and utterance verification 有权
    视听说话者识别和话语验证的方法和装置

    公开(公告)号:US06219640B1

    公开(公告)日:2001-04-17

    申请号:US09369706

    申请日:1999-08-06

    Abstract: Methods and apparatus for performing speaker recognition comprise processing a video signal associated with an arbitrary content video source and processing an audio signal associated with the video signal. Then, an identification and/or verification decision is made based on the processed audio signal and the processed video signal. Various decision making embodiments may be employed including, but not limited to, a score combination approach, a feature combination approach, and a re-scoring approach. In another aspect of the invention, a method of verifying a speech utterance comprises processing a video signal associated with a video source and processing an audio signal associated with the video signal. Then, the processed audio signal is compared with the processed video signal to determine a level of correlation between the signals. This is referred to as unsupervised utterance verification. In a supervised utterance verification embodiment, the processed video signal is compared with a script representing an audio signal associated with the video signal to determine a level of correlation between the signals.

    Abstract translation: 用于执行说话者识别的方法和装置包括处理与任意内容视频源相关联的视频信号并处理与视频信号相关联的音频信号。 然后,基于经处理的音频信号和处理的视频信号进行识别和/或验证决定。 可以采用各种决策实施例,包括但不限于分数组合方法,特征组合方法和重新评分方法。 在本发明的另一方面,验证语音发声的方法包括处理与视频源相关联的视频信号并处理与视频信号相关联的音频信号。 然后,将经处理的音频信号与经处理的视频信号进行比较,以确定信号之间的相关性水平。 这被称为无监督话语验证。 在受监督的话语验证实施例中,将处理的视频信号与表示与视频信号相关联的音频信号的脚本进行比较,以确定信号之间的相关性水平。

    Impulsivity estimates of mixtures of the power exponential distrubutions in speech modeling
    7.
    发明授权
    Impulsivity estimates of mixtures of the power exponential distrubutions in speech modeling 失效
    功率指数分布在语音建模中的混合的冲动性估计

    公开(公告)号:US06804648B1

    公开(公告)日:2004-10-12

    申请号:US09275782

    申请日:1999-03-25

    CPC classification number: G10L15/144

    Abstract: A parametric family of multivariate density functions formed by mixture models from univariate functions of the type exp(−|x|&bgr;) for modeling acoustic feature vectores are used in automatic recognition of speech. The parameter &bgr; is used to measure the non-Gaussian nature of the data. &bgr; is estimated from the input data using a maximum likelihood criterion. There is a balance between &bgr; and the number of data points that must be satisfied for efficient estimation.

    Abstract translation: 用于建模声学特征矢量的类型为exp( - | x |β)的单变量函数的混合模型形成的多变量密度函数的参数族被用于语音的自动识别。 参数β用于测量数据的非高斯性质。 使用最大似然准则从输入数据估计β。 在有效估计之间必须满足beta和数据点数之间的平衡。

    Maximum entropy and maximum likelihood criteria for feature selection from multivariate data
    8.
    发明授权
    Maximum entropy and maximum likelihood criteria for feature selection from multivariate data 有权
    用于多变量数据的特征选择的最大熵和最大似然准则

    公开(公告)号:US06609094B1

    公开(公告)日:2003-08-19

    申请号:US09576429

    申请日:2000-05-22

    CPC classification number: G06K9/623 G10L15/02

    Abstract: Improvements in speech recognition systems are achieved by considering projections of the high dimensional data on lower dimensional subspaces, subsequently by estimating the univariate probability densities via known univariate techniques, and then by reconstructing the density in the original higher dimensional space from the collection of univariate densities so obtained. The reconstructed density is by no means unique unless further restrictions on the estimated density are imposed. The variety of choices of candidate univariate densities as well as the choices of subspaces on which to project the data including their number further add to this non-uniqueness. Probability density functions are then considered that maximize certain optimality criterion as a solution to this problem. Specifically, those probability density function's that either maximize the entropy functional, or alternatively, the likelihood associated with the data are considered.

    Abstract translation: 通过考虑对低维子空间上的高维数据的预测,随后通过已知的单变量技术估计单变量概率密度,然后通过从单变量密度的收集重建原始高维空间中的密度来实现语音识别系统的改进 如此获得。 重建密度绝不是独一无二的,除非对估计密度的进一步限制。 候选单变量密度的选择种类以及项目数据(包括其数量)的子空间选择进一步增加了这种非唯一性。 因此,概率密度函数被认为是最大化某些最优性准则作为解决这一问题的方法。 具体地说,考虑使熵函数最大化的概率密度函数,或替代地,与数据相关联的可能性。

    Methods and apparatus for audio-visual speech detection and recognition
    9.
    发明授权
    Methods and apparatus for audio-visual speech detection and recognition 有权
    视听语音检测和识别的方法和装置

    公开(公告)号:US06594629B1

    公开(公告)日:2003-07-15

    申请号:US09369707

    申请日:1999-08-06

    CPC classification number: G06K9/00228 G06K9/00335 G10L15/25 G10L25/78

    Abstract: In a first aspect of the invention, methods and apparatus for providing speech recognition comprise the steps of processing a video signal associated with an arbitrary content video source, processing an audio signal associated with the video signal, and decoding the processed audio signal in conjunction with the processed video signal to generate a decoded output signal representative of the audio signal. In a second aspect 6f the invention, methods and apparatus for providing speech detection in accordance with a speech recognition system comprise the steps of processing a video signal associated with a video source to detect whether one or more features associated with the video signal are representative of speech, and processing an audio signal associated with the video signal in accordance with the speech recognition system to generate a decoded output signal representative of the audio signal when the one or more features associated with the video signal are representative of speech. Speech detection may also be performed using information from both the video path and the audio path simultaneously.

    Abstract translation: 在本发明的第一方面,用于提供语音识别的方法和装置包括以下步骤:处理与任意内容视频源相关联的视频信号,处理与视频信号相关联的音频信号,以及结合处理的音频信号 处理的视频信号以产生表示音频信号的解码输出信号。 在本发明的第二方面6f中,根据语音识别系统提供语音检测的方法和装置包括以下步骤:处理与视频源相关联的视频信号,以检测与视频信号相关联的一个或多个特征是否代表 并且当与视频信号相关联的一个或多个特征代表语音时,根据语音识别系统处理与视频信号相关联的音频信号,以产生表示音频信号的解码输出信号。 也可以使用来自视频路径和音频路径的信息同时执行语音检测。

    Speech driven lip synthesis using viseme based hidden markov models
    10.
    发明授权
    Speech driven lip synthesis using viseme based hidden markov models 有权
    使用基于Viseme的隐马尔可夫模型的语音驱动唇形合成

    公开(公告)号:US06366885B1

    公开(公告)日:2002-04-02

    申请号:US09384763

    申请日:1999-08-27

    CPC classification number: G11B27/10 G10L2021/105 G11B27/031

    Abstract: A method of speech driven lip synthesis which applies viseme based training models to units of visual speech. The audio data is grouped into a smaller number of visually distinct visemes rather than the larger number of phonemes. These visemes then form the basis for a Hidden Markov Model (HMM) state sequence or the output nodes of a neural network. During the training phase, audio and visual features are extracted from input speech, which is then aligned according to the apparent viseme sequence with the corresponding audio features being used to calculate the HMM state output probabilities or the output of the neutral network. During the synthesis phase, the acoustic input is aligned with the most likely viseme HMM sequence (in the case of an HMM based model) or with the nodes of the network (in the case of a neural network based system), which is then used for animation.

    Abstract translation: 基于视觉训练模型的视觉语音单元的语音驱动唇形合成方法。 音频数据被分组为较少数量的视觉上不同的视角,而不是较大数量的音素。 这些视差然后形成了隐马尔可夫模型(HMM)状态序列或神经网络的输出节点的基础。 在训练阶段,从输入语音中提取音频和视觉特征,然后根据明显的视度序列对准音频特征,使用相应的音频特征来计算HMM状态输出概率或中性网络的输出。 在合成阶段期间,声输入与最可能的viseme HMM序列(在基于HMM的模型的情况下)或网络的节点(在基于神经网络的系统的情况下)对齐,然后使用 用于动画。

Patent Agency Ranking