Adaptation of exponential models
    101.
    发明申请
    Adaptation of exponential models 有权
    指数模型的适应

    公开(公告)号:US20060018541A1

    公开(公告)日:2006-01-26

    申请号:US10977871

    申请日:2004-10-29

    IPC分类号: G06K9/00

    CPC分类号: G06F17/273 G06K9/6297

    摘要: A method and apparatus are provided for adapting an exponential probability model. In a first stage, a general-purpose background model is built from background data by determining a set of model parameters for the probability model based on a set of background data. The background model parameters are then used to define a prior model for the parameters of an adapted probability model that is adapted and more specific to an adaptation data set of interest. The adaptation data set is generally of much smaller size than the background data set. A second set of model parameters are then determined for the adapted probability model based on the set of adaptation data and the prior model.

    摘要翻译: 提供了一种适应指数概率模型的方法和装置。 在第一阶段,通过基于一组背景数据确定概率模型的一组模型参数,从背景数据构建通用背景模型。 背景模型参数然后用于定义适应性概率模型的参数的先验模型,其适应并且更具体于感兴趣的自适应数据集。 自适应数据集通常比背景数据集小得多的大小。 然后,基于适配数据集和先​​验模型,针对适应概率模型确定第二组模型参数。

    Removing noise from feature vectors
    102.
    发明申请
    Removing noise from feature vectors 有权
    从特征向量中消除噪声

    公开(公告)号:US20050256706A1

    公开(公告)日:2005-11-17

    申请号:US11185522

    申请日:2005-07-20

    IPC分类号: G10L15/02 G10L15/20 G10L21/00

    CPC分类号: G10L15/02 G10L15/20

    摘要: A method and computer-readable medium are provided for identifying clean signal feature vectors from noisy signal feature vectors. One aspect of the invention includes using an iterative approach to identify the clean signal feature vector. Another aspect of the invention includes using the variance of a set of noise feature vectors and/or channel distortion feature vectors when identifying the clean signal feature vectors.

    摘要翻译: 提供了一种用于从噪声信号特征向量识别干净信号特征向量的方法和计算机可读介质。 本发明的一个方面包括使用迭代方法来识别清洁信号特征向量。 本发明的另一方面包括当识别清洁信号特征向量时使用一组噪声特征向量和/或信道失真特征向量的方差。

    Method and apparatus for continuous valued vocal tract resonance tracking using piecewise linear approximations
    103.
    发明申请
    Method and apparatus for continuous valued vocal tract resonance tracking using piecewise linear approximations 审中-公开
    使用分段线性近似的连续值声道共振跟踪的方法和装置

    公开(公告)号:US20050114134A1

    公开(公告)日:2005-05-26

    申请号:US10723995

    申请日:2003-11-26

    CPC分类号: G10L25/48 G10L25/15

    摘要: A method and apparatus tracks vocal tract resonance components, including both frequencies and bandwidths, in a speech signal. The components are tracked by defining a state equation that is linear with respect to a past vocal tract resonance vector and that predicts a current vocal tract resonance vector. An observation equation is also defined that is linear with respect to a current vocal tract resonance vector and that predicts at least one component of an observation vector. The state equation, the observation equation, and a sequence of observation vectors are used to identify a sequence of vocal tract resonance vectors using Kalman filter algorithm. Under one embodiment, the observation equation is defined based on a piecewise linear approximation to a non-linear function. The parameters of the linear approximation are selected based on pre-defined regions, which are determined from a crude estimate of a vocal tract resonance vector.

    摘要翻译: 一种方法和装置在语音信号中跟踪声道共振分量,包括频率和频带两者。 通过定义相对于过去声道共振矢量线性的状态方程并且预测当前声道共振矢量来跟踪组件。 还定义了相对于当前声道共振矢量是线性的并且预测观察矢量的至少一个分量的观察方程。 状态方程,观察方程和观察矢量序列用于使用卡尔曼滤波算法识别声道共振矢量序列。 在一个实施例中,基于对非线性函数的分段线性近似来定义观察方程。 基于由声道共振矢量的粗略估计确定的预定义区域来选择线性近似的参数。

    Sound source separation using convolutional mixing and a priori sound source knowledge
    104.
    发明授权
    Sound source separation using convolutional mixing and a priori sound source knowledge 有权
    使用卷积混合和先验声源知识的声源分离

    公开(公告)号:US06879952B2

    公开(公告)日:2005-04-12

    申请号:US09842416

    申请日:2001-04-25

    IPC分类号: G10L11/02 G10L21/02 G10L19/12

    摘要: Sound source separation, without permutation, using convolutional mixing independent component analysis based on a priori knowledge of the target sound source is disclosed. The target sound source can be a human speaker. The reconstruction filters used in the sound source separation take into account the a priori knowledge of the target sound source, such as an estimate the spectra of the target sound source. The filters may be generally constructed based on a speech recognition system. Matching the words of the dictionary of the speech recognition system to a reconstructed signal indicates whether proper separation has occurred. More specifically, the filters may be constructed based on a vector quantization codebook of vectors representing typical sound source patterns. Matching the vectors of the codebook to a reconstructed signal indicates whether proper separation has occurred. The vectors may be linear prediction vectors, among others.

    摘要翻译: 公开了基于目标声源的先验知识的声源分离,不排列,使用卷积混合独立分量分析。 目标声源可以是人的扬声器。 在声源分离中使用的重建滤波器考虑了目标声源的先验知识,例如估计目标声源的频谱。 滤波器通常可以基于语音识别系统来构造。 将语音识别系统的词典与重构的信号进行匹配,表示是否发生了适当的分离。 更具体地说,滤波器可以基于表示典型声源模式的矢量的矢量量化码本构成。 将码本的向量与重构信号进行匹配,表示是否发生了适当的分离。 矢量可以是线性预测矢量等等。

    Method and apparatus for vocal tract resonance tracking using nonlinear predictor and target-guided temporal constraint
    105.
    发明申请
    Method and apparatus for vocal tract resonance tracking using nonlinear predictor and target-guided temporal constraint 有权
    使用非线性预测器和目标引导时间约束的声道共振跟踪的方法和装置

    公开(公告)号:US20050049866A1

    公开(公告)日:2005-03-03

    申请号:US10652976

    申请日:2003-08-29

    CPC分类号: G10L25/48 G10L25/15

    摘要: A method and apparatus map a set of vocal tract resonant frequencies, together with their corresponding bandwidths, into a simulated acoustic feature vector in the form of LPC cepstrum by calculating a separate function for each individual vocal tract resonant frequency/bandwidth and summing the result to form an element of the simulated feature vector. The simulated feature vector is applied to a model along with an input feature vector to determine a probability that the set of vocal tract resonant frequencies is present in a speech signal. Under one embodiment, the model includes a target-guided transition model that provides a probability of a vocal tract resonant frequency based on a past vocal tract resonant frequency and a target for the vocal tract resonant frequency. Under another embodiment, the phone segmentation is provided by an HMM system and is used to precisely determine which target value to use at each frame.

    摘要翻译: 一种方法和装置将一组声道共振频率及其相应带宽与LPC倒谱谱形式映射成模拟的声学特征向量,通过计算每个单独的声道共振频率/带宽的单独函数,并将结果相加到 形成模拟特征向量的元素。 将模拟特征向量与输入特征向量一起应用于模型,以确定声道谐振频率的集合存在于语音信号中的概率。 在一个实施例中,该模型包括目标引导的转换模型,其基于过去的声道共振频率和用于声道共振频率的目标提供声道共振频率的概率。 在另一个实施例中,电话分割由HMM系统提供,并且用于精确地确定在每个帧处使用哪个目标值。

    Method and apparatus for using formant models in resonance control for speech systems
    106.
    发明授权
    Method and apparatus for using formant models in resonance control for speech systems 失效
    在语音系统谐振控制中使用共振峰模型的方法和装置

    公开(公告)号:US06708154B2

    公开(公告)日:2004-03-16

    申请号:US10294129

    申请日:2002-11-14

    申请人: Alejandro Acero

    发明人: Alejandro Acero

    IPC分类号: G10L1300

    CPC分类号: G10L13/04 G10L25/15

    摘要: A model is provided for formants found in human speech. Under one aspect of the invention, the model is used to synthesize speech. Under this aspect of the invention, the formant model is used to identify a most likely formant track for the synthesized speech. Based on this track, a series of resonators are used to introduce the formants into the speech signal.

    摘要翻译: 为人类言语中发现的共振峰提供了一个模型。 在本发明的一个方面,该模型用于合成语音。 在本发明的这个方面,共振峰模型用于识别用于合成语音的最可能的共振峰轨道。 基于该轨道,使用一系列谐振器将共振峰引入到语音信号中。

    Method and system of runtime acoustic unit selection for speech synthesis
    107.
    发明授权
    Method and system of runtime acoustic unit selection for speech synthesis 失效
    用于语音合成的运行时音单元选择的方法和系统

    公开(公告)号:US5913193A

    公开(公告)日:1999-06-15

    申请号:US648808

    申请日:1996-04-30

    CPC分类号: G10L13/07

    摘要: The present invention pertains to a concatenative speech synthesis system and method which produces a more natural sounding speech. The system provides for multiple instances of each acoustic unit which can be used to generate a speech waveform representing an linguistic expression. The multiple instances are formed during an analysis or training phase of the synthesis process and are limited to a robust representation of the highest probability instances. The provision of multiple instances enables the synthesizer to select the instance which closely resembles the desired instance thereby eliminating the need to alter the stored instance to match the desired instance. This in essence minimizes the spectral distortion between the boundaries of adjacent instances thereby producing more natural sounding speech.

    摘要翻译: 本发明涉及一种产生更自然的声音语音的级联语音合成系统和方法。 该系统提供每个声学单元的多个实例,其可用于生成表示语言表达式的语音波形。 多个实例在合成过程的分析或训练阶段期间形成,并且被限制为最高概率实例的鲁棒表示。 提供多个实例使得合成器能够选择非常类似于期望实例的实例,从而消除了改变存储的实例以匹配所需实例的需要。 这实质上使相邻实例的边界之间的频谱失真最小化,从而产生更自然的声音语音。

    Factored transforms for separable adaptation of acoustic models

    公开(公告)号:US09984678B2

    公开(公告)日:2018-05-29

    申请号:US13427907

    申请日:2012-03-23

    IPC分类号: G10L15/06 G10L15/20 G10L15/07

    摘要: Various technologies described herein pertain to adapting a speech recognizer to input speech data. A first linear transform can be selected from a first set of linear transforms based on a value of a first variability source corresponding to the input speech data, and a second linear transform can be selected from a second set of linear transforms based on a value of a second variability source corresponding to the input speech data. The linear transforms in the first and second sets can compensate for the first variability source and the second variability source, respectively. Moreover, the first linear transform can be applied to the input speech data to generate intermediate transformed speech data, and the second linear transform can be applied to the intermediate transformed speech data to generate transformed speech data. Further, speech can be recognized based on the transformed speech data to obtain a result.

    Search lexicon expansion
    109.
    发明授权

    公开(公告)号:US09928296B2

    公开(公告)日:2018-03-27

    申请号:US12970477

    申请日:2010-12-16

    IPC分类号: G06F17/30 G06F17/27

    摘要: One or more techniques and/or systems are disclosed for creating an expanded or improved lexicon for use in search-based semantic tagging. A set of first documents can be identified using a set of first lexicon elements as queries, and one or more first document patterns can be extracted from the set of first documents. The document patterns can be used to find one or more second documents in a query log that comprise the document patterns, which are associated with query terms used to return the second documents. The query terms for the second documents can be extracted and used to expand the lexicon. Elements within the lexicon may be weighted based upon relevance to different query domains, for example.

    Speaker identification
    110.
    发明授权
    Speaker identification 有权
    扬声器识别

    公开(公告)号:US08719019B2

    公开(公告)日:2014-05-06

    申请号:US13093680

    申请日:2011-04-25

    IPC分类号: G01L15/00

    CPC分类号: G10L17/02

    摘要: Speaker identification techniques are described. In one or more implementations, sample data is received at a computing device of one or more user utterances captured using a microphone. The sample data is processed by the computing device to identify a speaker of the one or more user utterances. The processing involving use of a feature set that includes features obtained using a filterbank having filters that space linearly at higher frequencies and logarithmically at lower frequencies, respectively, features that model the speaker's vocal tract transfer function, and features that indicate a vibration rate of vocal folds of the speaker of the sample data.

    摘要翻译: 描述了扬声器识别技术。 在一个或多个实现中,在使用麦克风捕获的一个或多个用户话语的计算设备处接收采样数据。 样本数据由计算设备处理以识别一个或多个用户话语的说话者。 涉及使用特征集合的处理,其特征集包括使用具有过滤器的滤波器组获得的特征,所述滤波器组具有在较高频率处线性地以较低频率空间对数地放置的滤波器,其特征在于模拟扬声器的声道传递函数,以及指示声带的振动速率的特征 样本数据的扬声器的折叠。