ピッチ周波数を検出する音声解析装置、音声解析方法、および音声解析プログラム
    11.
    发明申请
    ピッチ周波数を検出する音声解析装置、音声解析方法、および音声解析プログラム 审中-公开
    语音分析仪检测音频,语音分析方法和语音分析程序

    公开(公告)号:WO2006132159A1

    公开(公告)日:2006-12-14

    申请号:PCT/JP2006/311123

    申请日:2006-06-02

    CPC classification number: G10L25/90

    Abstract:  本発明の音声解析装置は、音声取得部、周波数変換部、自己相関部、ピッチ検出部を備える。周波数変換部は、音声取得部で取り込んだ音声信号を周波数スペクトルに変換する。自己相関部は、周波数スペクトルを周波数軸上でずらしながら自己相関波形を求める。ピッチ検出部は、自己相関波形のローカルな山と山または谷と谷の間隔からピッチ周波数を求める。

    Abstract translation: 语音分析器包括语音获取部分,频率转换部分,自相关部分和音调检测部分。 频率转换部分将由语音获取部分获取的语音信号转换成频谱。 自相关部分通过沿频率轴移动频谱来确定自相关波形。 音调检测部分根据自相关波形的两个局部峰或谷之间的距离确定髓音频率。

    音高推定方法及び装置並びに音高推定用プログラム
    12.
    发明申请
    音高推定方法及び装置並びに音高推定用プログラム 审中-公开
    PITCH估计方法和设备,以及PITCH估计程序

    公开(公告)号:WO2006106946A1

    公开(公告)日:2006-10-12

    申请号:PCT/JP2006/306899

    申请日:2006-03-31

    Inventor: 後藤 真孝

    CPC classification number: G10L25/90 G10G3/04 G10H2210/066

    Abstract: A pitch estimating method and device, and a pitch estimating program for estimating the weight of the probability density function of the fundamental frequency and the amplitudes of the harmonic components with operations less than conventional. In the improved pitch estimating method, 1200log 2 h and exp[-(x-(F+1200log 2 h)) 2 /2W 2 ] of equation 121 is computed in advance. [Eq. 121] (61) The computation of eq. 121 is executed only for the fundamental frequency F at which x-(F+1200log 2 h) is close to 0, and the result is stored in a memory of the computer. With this, the operations can be made much less than conventional, and the computation time can be shortened.

    Abstract translation: 音调估计方法和装置,以及用于估计基频的概率密度函数的权重和谐波分量的幅度的音调估计方法,其操作小于常规。 在改进的音调估计方法中,1200log 2 h和exp [ - (x-(F + 1200log 2 2”。 [公式。 121](61) 121仅对于x-(F + 1200log 2 / 2h)接近0的基频F执行,结果存储在计算机的存储器中。 由此,可以比常规操作少得多,并且可以缩短计算时间。

    METHOD AND APPARATUS FOR SPEECH CODING
    13.
    发明申请
    METHOD AND APPARATUS FOR SPEECH CODING 审中-公开
    语音编码方法与装置

    公开(公告)号:WO2005064591A1

    公开(公告)日:2005-07-14

    申请号:PCT/US2004/042642

    申请日:2004-12-17

    CPC classification number: G10L19/09

    Abstract: A method (Fig. 9) and apparatus (500, 600) for prediction in a speech-coding system extends a 1st order long-term predictor (LTP) filter, using a sub-sample resolution delay, to a multi-tap LTP filter (504, 604). From another perspective, a conventional integer-sample resolution multi-tap LTP filter is extended to use sub-sample resolution delay. Such a multi-tap LTP filter offers a number of advantages over the prior-art. Particularly, defining the lag with sub-sample resolution makes it possible to explicitly model the delay values that have a fractional component, within the limits of resolution of the over-sampling factor used by the interpolation filter. The coefficients (ßi's) of the multi-tap LTP filter are thus largely freed from modeling the effect of delays that have a fractional component. Consequently their main function is to maximize the prediction gain of the LTP filter via modeling the degree of periodicity that is present and by imposing spectral shaping.

    Abstract translation: 在语音编码系统中用于预测的方法(图9)和装置(500,600)将使用子样本分辨率延迟的一阶长期预测器(LTP)滤波器扩展到多抽头LTP滤波器 (504,604)。 从另一个角度来说,常规的整数抽样分辨率多抽头LTP滤波器被扩展以使用子样本分辨率延迟。 这种多抽头LTP滤波器具有优于现有技术的许多优点。 特别地,使用子样本分辨率定义滞后使得可以在内插滤波器使用的过采样因子的分辨率的限度内明确地建模具有分数分量的延迟值。 因此,多抽头LTP滤波器的系数(ßi)在很大程度上不会对具有分数分量的延迟的影响进行建模。 因此,它们的主要功能是通过建模存在的周期程度和施加频谱整形来最大化LTP滤波器的预测增益。

    PITCH QUANTIZATION FOR DISTRIBUTED SPEECH RECOGNITION
    14.
    发明申请
    PITCH QUANTIZATION FOR DISTRIBUTED SPEECH RECOGNITION 审中-公开
    用于分布式语音识别的定量定量

    公开(公告)号:WO2004072949A3

    公开(公告)日:2004-12-09

    申请号:PCT/US2004003425

    申请日:2004-02-05

    CPC classification number: G10L19/09 G10L15/30

    Abstract: A system, method and computer readable medium for quantizing pitch information of audio is disclosed. The method includes capturing audio representing a numbered frame of a plurality of numbered frames. The method further includes calculating a class of the frame, wherein a class is any one of a voiced or unvoiced class. If the frame is a voiced class, a pitch is calculated for the frame (903). If the frame is an even numbered frame and a voiced class, a codeword of first length is calculated by absolutely quantizing the frame pitch (910). If the frame is an odd numbered frame and a voiced class and a reliable frame is available, a codeword of a second length is calculated by differentially quantizing the frame pitch (905). If there is no reliable frame available, a codeword of the second length is calculated by absolutely quantizing the frame pitch.

    Abstract translation: 公开了一种用于量化音频的音调信息的系统,方法和计算机可读介质。 该方法包括捕获表示多个编号帧的编号帧的音频。 该方法还包括计算帧的类别,其中类是有声或无声类中的任何一个。 如果帧是浊音类,则为帧计算音高(903)。 如果帧是偶数帧和浊音类,则通过绝对量化帧间距来计算第一长度的码字(910)。 如果帧是奇数帧,并且有声类和可靠帧可用,则通过对帧间距进行差分量化来计算第二长度的码字(905)。 如果没有可靠的帧可用,则通过绝对量化帧间距来计算第二长度的码字。

    METHODS AND APPARATUS FOR PITCH DETERMINATION
    15.
    发明申请
    METHODS AND APPARATUS FOR PITCH DETERMINATION 审中-公开
    测量方法和装置

    公开(公告)号:WO2003038806A1

    公开(公告)日:2003-05-08

    申请号:PCT/US2002/033895

    申请日:2002-10-23

    CPC classification number: G10L25/90

    Abstract: Methods and apparatus for detecting periodicity and/or for determining the fundamental period of a signal such as speech. The methods include (104) embedding a portion of a sampled digitized signal into an m-dimensional state space to obtain a sequence of m-dimensional vectors (106), selecting closest pairs of vectors in state space from a plurality of possible pairs of m-dimensional vectors in said sequence of m-dimensional vectors (108), accumulating total numbers of selected closest pairs of vectors having the same time separation values to produce a histogram of accumulated numbers, and (110) locating at least a highest peak in a portion of said histogram to obtain a value indicating the fundamental period of the signal. Various embodiments are directed to speech and audio signal processing and other speech related applications. However, the methods have a general nature and can be applied to other types of periodic or quasi-periodic signals as well.

    Abstract translation: 用于检测周期性和/或用于确定诸如语音的信号的基本周期的方法和装置。 这些方法包括将一部分采样的数字化信号嵌入到m维状态空间中以获得m维向量序列106,从状态空间中的多个可能的m对矢量对中选择最近的一对向量 所述m维向量序列108累积具有相同时间间隔值的所选择的最近的向量对的总数,以产生累积数的直方图,以及110在所述直方图的一部分中定位至少最高峰,以获得值 指示信号的基本周期。 各种实施例涉及语音和音频信号处理以及其他语音相关应用。 然而,这些方法具有一般性质并且也可以应用于其它类型的周期或准周期信号。

    APPARATUS, SYSTEM AND METHOD FOR SPEECH COMPRESSION AND DECOMPRESSION
    16.
    发明申请
    APPARATUS, SYSTEM AND METHOD FOR SPEECH COMPRESSION AND DECOMPRESSION 审中-公开
    用于语音压缩和分解的装置,系统和方法

    公开(公告)号:WO0054253A9

    公开(公告)日:2002-07-04

    申请号:PCT/US0005992

    申请日:2000-03-08

    Applicant: INFOLIO INC

    Inventor: GUBERMAN SHELIA

    CPC classification number: G10L19/02

    Abstract: The invention provides system, apparatus, and method for compressing a speech signal by decimating or removing somewhat redundant portions of the signal while retaining reference signal portions sufficient to reconstruct the signal (170) without noticeable loss in quality, thereby permitting a storage and transmission of high quality speech with minimal storage volume or transmission bandwidth requirements. Speech pitch waveform decimation is used to reduce data to produce an encoded speech signal during compression (162), and time based interpolative speech reconstruction is used on the encoded signal to reconstruct the original speech signal (160). In another aspect an internet (180) voice electronic mail system (174) is provided which has minimal voice message storage and transmission requirements while retaining high fidelity voice quality.

    Abstract translation: 本发明提供了用于通过抽取或去除信号的稍多冗余部分来压缩语音信号的系统,装置和方法,同时保留足以重建信号(170)的参考信号部分,而没有明显的质量损失,从而允许存储和传输 具有最低存储容量或传输带宽要求的高质量语音。 在压缩期间,使用语音音调波形抽取来减少数据以产生编码的语音信号(162),并且对编码的信号使用基于时间的内插语音重构来重构原始语音信号(160)。 在另一方面,提供了一种具有最小语音消息存储和传输要求的互联网(180)语音电子邮件系统(174),同时保持高保真语音质量。

    METHOD AND APPARATUS FOR ROBUST SPEECH CLASSIFICATION
    17.
    发明申请
    METHOD AND APPARATUS FOR ROBUST SPEECH CLASSIFICATION 审中-公开
    用于稳健语音分类的方法和装置

    公开(公告)号:WO02047068A2

    公开(公告)日:2002-06-13

    申请号:PCT/US2001/046971

    申请日:2001-12-04

    CPC classification number: G10L25/93 G10L19/025 G10L19/22 G10L25/78

    Abstract: A speech classification technique (502-530) for robust classification of varying modes of speech to enable maximum performance of multi-mode variable bit rate encoding techniques. A speech classifier accurately classifies a high percentage of speech segments for encoding at minimal bit rates, meeting lower bit rate requirements. Highly accurate speech classification produces a lower average encoded bit rate, and higher quality decoded speech. The speech classifier considers a maximum number of parameters for each frame of speech, producing numerous and accurate speech mode classifications for each frame. The speech classifier correctly classifies numerous modes of speech under varying environmental conditions. The speech classifier inputs classification parameters from external components, generates internal classification parameters from the input parameters, sets a Normalized Auto-correlation Coefficient Function threshold and selects a parameter analyzer according to the signal environment, and then analyzes the parameters to produce a speech mode classification.

    Abstract translation: 一种语音分类技术(502-530),用于对变化的语音模式的鲁棒分类,以实现多模式可变比特率编码技术的最大性能。 语音分类器以最低比特率对用于编码的高百分比的语音段进行精确的分类,满足较低的比特率要求。 高精度的语音分类产生较低的平均编码比特率和较高质量的解码语音。 语音分类器考虑每个语音帧的最大参数数,为每个帧产生大量且准确的语音模式分类。 语音分类器在不同的环境条件下正确分类了许多语音模式。 语音分类器从外部组件输入分类参数,从输入参数生成内部分类参数,设置归一化自相关系数函数阈值,并根据信号环境选择参数分析仪,然后分析参数以产生语音模式分类 。

    CODEBOOK STRUCTURE AND SEARCH FOR SPEECH CODING
    18.
    发明申请
    CODEBOOK STRUCTURE AND SEARCH FOR SPEECH CODING 审中-公开
    CODEBOOK结构和搜索语音编码

    公开(公告)号:WO0225638A3

    公开(公告)日:2002-06-13

    申请号:PCT/IB0101729

    申请日:2001-09-17

    Inventor: GAO YANG

    Abstract: A speech compression system with a special fixed codebook structure and a new search routine is proposed for speech coding. The system is capable of encoding a speech signal into a bitstream for subsequent decoding to generate synthesized speech. The codebook structure uses a plurality of subcodebooks. Each subcodebook is designed to fit a specific group of speech signals. A criterion value is calculated for each subcodebook to minimize an error signal in a minimization loop as part of the coding system. An external signal sets a maximum bitstream rate for delivering encoded speech into a communications system. The speech compression system comprises a full-rate codec, a half-rate codec, a quarter-rate codec and an eighth-rate codec. Each codec is selectively activated to encode and decode the speech signals at different bit rates to enhance overall quality of the synthesized speech at a limited average bit rate.

    Abstract translation: 提出了一种具有特殊固定码本结构和新的搜索程序的语音压缩系统,用于语音编码。 该系统能够将语音信号编码为比特流以用于后续解码以产生合成语音。 码本结构使用多个子码本。 每个子码本被设计成适合特定的一组语音信号。 为每个子码书计算一个标准值,以最小化作为编码系统的一部分的最小化循环中的误差信号。 外部信号设置用于将编码语音传递到通信系统中的最大比特率速率。 语音压缩系统包括全速率编解码器,半速率编解码器,四分之一速率编解码器和八速率编解码器。 选择性地激活每个编解码器以以不同比特率对语音信号进行编码和解码,以在有限的平均比特率下提高合成语音的整体质量。

    FRAME ERASURE COMPENSATION METHOD IN A VARIABLE RATE SPEECH CODER
    19.
    发明申请
    FRAME ERASURE COMPENSATION METHOD IN A VARIABLE RATE SPEECH CODER 审中-公开
    可变速率语音编码器中的帧擦除补偿方法

    公开(公告)号:WO0182289A3

    公开(公告)日:2002-01-10

    申请号:PCT/US0112665

    申请日:2001-04-18

    Applicant: QUALCOMM INC

    CPC classification number: G10L21/02 G10L19/005 G10L19/097

    Abstract: A frame erasure compensation method in a variable-rate speech coder includes quantizing, with a first encoder, a pitch lag value for a current frame and a first delta pitch lag value equal to the difference between the pitch lag value for the current frame and the pitch lag value for the previous frame. A second, predictive encoder quantizes only a second delta pitch lag value for the previous frame (equal to the difference between the pitch lag value for the previous frame and the pitch lag value for the frame prior to that frame). If the frame prior to the previous frame is processed as a frame erasure, the pitch lag value for the previous frame is obtained by subtracting the first delta pitch lag value from the pitch lag value for the current frame. The pitch lag value for the erasure frame is then obtained by subtracting the second delta pitch lag value from the pitch lag value for the previous frame. Additionally, a waveform interpolation method may be used to smooth discontinuities caused by changes in the coder pitch memory.

    Abstract translation: 可变速率语音编码器中的帧擦除补偿方法包括:利用第一编码器量化当前帧的音调滞后值,以及等于当前帧的音调滞后值与第 前一帧的音调滞后值。 第二预测编码器仅量化前一帧的第二增量音调滞后值(等于先前帧的音调滞后值与该帧之前的帧的音调滞后值之间的差)。 如果先前帧之前的帧被作为帧擦除处理,则通过从当前帧的音调滞后值中减去第一增量音调滞后值来获得先前帧的音调滞后值。 然后通过从前一帧的音调滞后值减去第二增量音调滞后值来获得擦除帧的音调滞后值。 此外,可以使用波形插值方法来平滑由编码器音调存储器的变化引起的不连续性。

Patent Agency Ranking