Fast frequency-domain pitch estimation
    1.
    发明授权
    Fast frequency-domain pitch estimation 有权
    快速频域间距估计

    公开(公告)号:US06587816B1

    公开(公告)日:2003-07-01

    申请号:US09617582

    申请日:2000-07-14

    IPC分类号: G10L1104

    CPC分类号: G10L25/90

    摘要: A method for estimating a pitch frequency of an audio signal includes computing a first transform of the signal to a frequency domain over a first time interval, and computing a second transform of the signal to the frequency domain over a second time interval, which contains the first time interval. A line spectrum of the signal is found, based on the first and second transforms, the spectrum including spectral lines having respective line amplitudes and line frequencies. A utility function that is periodic in the frequencies of the lines in the spectrum is then computed. This function is indicative, for each candidate pitch frequency in a given pitch frequency range, of a compatibility of the spectrum with the candidate pitch frequency. The pitch frequency of the speech signal is estimated responsive to the utility function.

    摘要翻译: 一种用于估计音频信号的音调频率的方法包括:在第一时间间隔上计算信号到频域的第一变换,以及在第二时间间隔上计算信号到频域的第二变换,该第二时间间隔包含 第一时间间隔。 基于第一和第二变换,发现包括具有各自线路幅度和线路频率的谱线的频谱的信号线谱。 然后计算在频谱中的线的频率中周期性的效用函数。 该功能针对给定音调频率范围内的每个候选音调频率指示频谱与候选音调频率的兼容性。 响应于效用函数来估计语音信号的音调频率。

    Method and system for low bit rate speech coding with speech recognition features and pitch providing reconstruction of the spectral envelope
    2.
    发明授权
    Method and system for low bit rate speech coding with speech recognition features and pitch providing reconstruction of the spectral envelope 有权
    用于具有语音识别特征的低比特率语音编码的方法和系统,并且提供频谱包络的​​重建

    公开(公告)号:US06678655B2

    公开(公告)日:2004-01-13

    申请号:US10291590

    申请日:2002-11-12

    IPC分类号: G10L1912

    CPC分类号: G10L19/02 G10L15/02

    摘要: A method for encoding a digitized speech signal so as to generate data capable of being decoded as speech. A digitized speech signal is first converted to a series of feature vectors using for example known Mel-frequency Cepstral coefficients (MFCC) techniques. At successive instances instance of time a respective pitch value of the digitized speech signal is computed, and successive acoustic vectors each containing the respective pitch value and feature vector are compressed so as to derive therefrom a bit stream. A suitable decoder reverses the operation so as to extract the features vectors and pitch values, thus allowing speech reproduction and playback. In addition, speech recognition is possible using the decompressed feature vectors, with no impairment of the recognition accuracy and no computational overhead.

    摘要翻译: 一种用于编码数字化语音信号以便产生能够被解码为语音的数据的方法。 使用例如已知的Mel-frequency倒谱系数(MFCC)技术,首先将数字化语音信号转换成一系列特征向量。 在连续的实例中,计算数字化语音信号的相应音调值,并且压缩每个包含相应音调值和特征向量的连续声矢量,从而从其中导出比特流。 合适的解码器反转操作以提取特征向量和音调值,从而允许语音再现和回放。 另外,使用解压缩的特征向量可以进行语音识别,而不会损害识别精度并且没有计算开销。

    Feature-domain concatenative speech synthesis
    3.
    发明授权
    Feature-domain concatenative speech synthesis 有权
    特征域级联语音合成

    公开(公告)号:US07035791B2

    公开(公告)日:2006-04-25

    申请号:US09901031

    申请日:2001-07-10

    申请人: Dan Chazan Ron Hoory

    发明人: Dan Chazan Ron Hoory

    IPC分类号: G10L11/04

    CPC分类号: G10L13/07 G10L25/18

    摘要: A method for speech synthesis includes receiving an input speech signal containing a set of speech segments, and estimating spectral envelopes of the input speech signal in a succession of time intervals during each of the speech segments. The spectral envelopes are integrated over a plurality of window functions in a frequency domain so as to determine elements of feature vectors corresponding to the speech segments. An output speech signal is reconstructed by concatenating the feature vectors corresponding to a sequence of the speech segments.

    摘要翻译: 一种用于语音合成的方法包括接收包含一组语音段的输入语音信号,并且在每个语音段期间以一连串的时间间隔估计输入语音信号的频谱包络。 频谱包络被集成在频域中的多个窗口函数上,以便确定与语音段对应的特征向量的元素。 通过连接对应于语音片段序列的特征向量来重构输出语音信号。

    Method and system for speech reconstruction from speech recognition features, pitch and voicing with resampled basis functions providing reconstruction of the spectral envelope
    5.
    发明授权
    Method and system for speech reconstruction from speech recognition features, pitch and voicing with resampled basis functions providing reconstruction of the spectral envelope 有权
    用于语音识别特征的语音重建的方法和系统,具有重新采样的基函数的音调和发音,提供频谱包络的​​重建

    公开(公告)号:US06725190B1

    公开(公告)日:2004-04-20

    申请号:US09432081

    申请日:1999-11-02

    IPC分类号: G10L1902

    CPC分类号: G10L13/07 G10L25/18

    摘要: A speech reconstruction method and system for converting a series of binned spectra or functions thereof such as the Mel Frequency Cepstra Coefficients (MFCC), of an original digitized speech signal, into a reconstructed speech signal, where each binned spectrum has a respective pitch value and voicing decision. The binned spectra are derived from the original digitized speech signal at successive instances by multiplying each estimate of the spectral envelope by a predetermined set of frequency domain window functions and computing the integrals thereof. At each respective time instance, harmonic frequencies and weights are generated according to the respective pitch value and voicing decision. Basis functions having bounded supports on the frequency axis are each sampled at all said harmonic frequencies, which are within its support and multiplied by respective harmonic weights. The sampled basis functions are combined with respective phases, generated according to the pitch value, voicing decision and possibly the binned spectrum, resulting in a complex line spectrum corresponding to each basis function. Coefficients are generated of the basis functions, and each of the points of the respective complex line spectra is multiplied by the respective basis function coefficient. The complex line spectra are summed up to generate for each time instance a single complex line spectrum with values for all harmonic frequencies. A time signal is generated from complex line spectra computed at successive instances of time.

    摘要翻译: 一种将原始数字化语音信号的Mel频率Cepstra系数(MFCC)的一系列二进制频谱或其功能转换为重构语音信号的语音重建方法和系统,其中每个合并频谱具有相应的音调值, 发声决定。 通过将频谱包络的​​每个估计乘以预定的一组频域窗口函数并计算其积分,在连续实例中从原始数字化语音信号导出分箱频谱。 在各个时间的情况下,根据相应的音调值和发音决定产生谐波频率和权重。 在频率轴上具有界限支撑的基础功能在所有谐波频率下进行采样,所述谐波频率在其支持范围内并乘以相应的谐波权重。 采样基函数与根据音调值,发声判定和可能的分频谱产生的相位相结合,得到与每个基函数对应的复谱线谱。 生成基函数的系数,并将各个复谱谱的每个点乘以各自的基函数系数。 归纳出复谱线谱,为每个时间实例生成具有所有谐波频率值的单个复谱谱线。 时间信号由在连续的时间实例计算出的复线谱产生。

    VOCAL SOURCE EXTRACTION BY MAXIMUM PHASE DETECTION
    8.
    发明申请
    VOCAL SOURCE EXTRACTION BY MAXIMUM PHASE DETECTION 有权
    通过最大相位检测提取VOCAL SOURCE

    公开(公告)号:US20130325455A1

    公开(公告)日:2013-12-05

    申请号:US13487275

    申请日:2012-06-04

    IPC分类号: G10L11/04

    CPC分类号: G10L25/75 G10L25/03 G10L25/45

    摘要: Methods, apparatus and computer program products implement embodiments of the present invention that include receiving a time domain voice signal, and extracting a single pitch cycle from the received signal. The extracted single pitch cycle is transformed to a frequency domain, and the misclassified roots of the frequency domain are identified and corrected. Using the corrected roots, an indication of a maximum phase of the frequency domain is generated.

    摘要翻译: 方法,装置和计算机程序产品实现本发明的实施例,其包括接收时域语音信号,并从接收到的信号中提取单个音调周期。 提取的单音调周期被转换为频域,并且识别和校正频域的错误分类的根。 使用校正的根,产生频域的最大相位的指示。

    Dictionary lookup for mobile devices using spelling recognition
    9.
    发明申请
    Dictionary lookup for mobile devices using spelling recognition 审中-公开
    使用拼写识别的移动设备的字典查找

    公开(公告)号:US20070016420A1

    公开(公告)日:2007-01-18

    申请号:US11176154

    申请日:2005-07-07

    IPC分类号: G10L15/04 G10L15/00

    CPC分类号: G10L15/19

    摘要: A method for querying an electronic dictionary using letters of an alphabet enunciated by a user includes accepting a speech input from the user. The speech input includes a sequence of spelled letters enunciated by the user that spell a query word. The speech input is analyzed to determine one or more sequences of the letters that approximate the sequence of spelled letters. The one or more sequences of the letters are post-processed so as to produce a plurality of recognized words approximating the query word. The electronic dictionary is queried with the plurality of recognized words so as to retrieve a respective plurality of dictionary entries. A list of results including the plurality of recognized words and the respective plurality of dictionary entries is presented to the user.

    摘要翻译: 一种使用用户名字母字母查询电子词典的方法包括接受来自用户的语音输入。 语音输入包括由用户发出拼写查询词的拼写字母序列。 分析语音输入以确定近似拼写字母序列的一个或多个字母序列。 对字母的一个或多个序列进行后处理,以产生近似于查询词的多个识别词。 使用多个识别的字查询电子词典,以便检索相应的多个字典条目。 向用户呈现包括多个识别字和相应的多个字典条目的结果列表。

    Voice transformation with encoded information
    10.
    发明授权
    Voice transformation with encoded information 有权
    具有编码信息的语音变换

    公开(公告)号:US08930182B2

    公开(公告)日:2015-01-06

    申请号:US13049924

    申请日:2011-03-17

    CPC分类号: G10L21/003 G10L19/018

    摘要: Method, system, and computer program product for voice transformation are provided. The method includes transforming a source speech using transformation parameters, and encoding information on the transformation parameters in an output speech using steganography, wherein the source speech can be reconstructed using the output speech and the information on the transformation parameters. A method for reconstructing voice transformation is also provided including: receiving an output speech of a voice transformation system wherein the output speech is transformed speech which has encoded information on the transformation parameters using steganography; extracting the information on the transformation parameters; and carrying out an inverse transformation of the output speech to obtain an approximation of an original source speech.

    摘要翻译: 提供语音转换的方法,系统和计算机程序产品。 该方法包括使用变换参数来变换源语言,以及使用隐写术对输入语音中的变换参数对信息进行编码,其中可以使用输出语音和关于变换参数的信息来重构源语音。 还提供了一种用于重建语音变换的方法,包括:接收语音转换系统的输出语音,其中输出语音是使用隐写术编码关于变换参数的信息的变换语音; 提取变换参数信息; 并执行输出语音的逆变换以获得原始源语音的近似。