Multi-mode speech encoding system
    21.
    发明申请
    Multi-mode speech encoding system 有权
    多模式语音编码系统

    公开(公告)号:US20090024386A1

    公开(公告)日:2009-01-22

    申请号:US12229324

    申请日:2008-08-20

    申请人: Huan-Yu Su Yang Gao

    发明人: Huan-Yu Su Yang Gao

    IPC分类号: G10L21/00

    摘要: A method comprises analyzing each frame of a plurality of frames of the speech signal to determine one or more speech parameters for the speech signal; deciding, for each frame of the plurality of frames of the speech signal, based on the one or more speech parameters of the speech signal, to select one of a plurality of encoding modes including a first encoding mode and a second encoding mode for encoding each frame of the plurality of frames of the speech signal; encoding each frame of the plurality of frames of the speech signal according to the selected one of the plurality of encoding modes for each frame of the plurality of frames in the deciding; the first encoding mode supports a first encoding rate and the second encoding mode supports a second encoding rate, wherein the first encoding rate is the same encoding rate as the encoding rate.

    摘要翻译: 一种方法包括分析语音信号的多个帧中的每个帧以确定语音信号的一个或多个语音参数; 基于所述语音信号的所述一个或多个语音参数来决定所述语音信号的所述多个帧中的每一帧,以选择包括第一编码模式和第二编码模式的多个编码模式之一,以对每个所述语音信号进行编码 帧的多个帧的语音信号; 根据所述多个编码模式中的所选择的一个,对所述多个帧中的每一帧在所述判定中编码所述语音信号的所述多个帧中的每一帧; 第一编码模式支持第一编码率,第二编码模式支持第二编码速率,其中第一编码速率与编码速率相同。

    System for speech encoding having an adaptive encoding arrangement
    22.
    发明授权
    System for speech encoding having an adaptive encoding arrangement 有权
    具有自适应编码装置的语音编码系统

    公开(公告)号:US07072832B1

    公开(公告)日:2006-07-04

    申请号:US09663002

    申请日:2000-09-15

    申请人: Huan-Yu Su Yang Gao

    发明人: Huan-Yu Su Yang Gao

    IPC分类号: G10L19/00

    摘要: In accordance with one aspect of the invention, a selector supports the selection of a first encoding scheme or the second encoding scheme based upon the detection or absence of the triggering characteristic in the interval of the input speech signal. The first encoding scheme has a pitch pre-processing procedure for processing the input speech signal to form a revised speech signal biased toward an ideal voiced and stationary characteristic. The pre-processing procedure allows the encoder to fully capture the benefits of a bandwidth-efficient, long-term predictive procedure for a greater amount of speech components of an input speech signal than would otherwise be possible. In accordance with another aspect of the invention, the second encoding scheme entails a long-term prediction mode for encoding the pitch on a sub-frame by sub-frame basis. The long-term prediction mode is tailored to where the generally periodic component of the speech is generally not stationary or less than completely periodic and requires greater frequency of updates from the adaptive codebook to achieve a desired perceptual quality of the reproduced speech under a long-term predictive procedure.

    摘要翻译: 根据本发明的一个方面,选择器基于输入语音信号的间隔中的触发特性的检测或不存在,支持选择第一编码方案或第二编码方案。 第一编码方案具有用于处理输入语音信号以形成偏向理想有声和静态特征的修正语音信号的音调预处理过程。 预处理过程允许编码器完全捕获带宽有效的长期预测程序的优点,用于输入语音信号的大量语音分量比否则可能的更多。 根据本发明的另一方面,第二编码方案需要一种长期预测模式,用于以子帧为基础对子帧上的音调进行编码。 长期预测模式被定制为语音的大致周期性分量通常不是静止的或小于完全周期性的,并且需要来自自适应码本的更高频率的更新以在长时间内实现再现语音的期望感知质量, 术语预测程序。

    Speech encoder adaptively applying pitch preprocessing with warping of target signal
    23.
    发明授权
    Speech encoder adaptively applying pitch preprocessing with warping of target signal 有权
    语音编码器自适应地应用具有目标信号翘曲的音调预处理

    公开(公告)号:US06330533B2

    公开(公告)日:2001-12-11

    申请号:US09154660

    申请日:1998-09-18

    申请人: Huan-Yu Su Yang Gao

    发明人: Huan-Yu Su Yang Gao

    IPC分类号: G10L2100

    摘要: A multi-rate speech codec supports a plurality of encoding bit rate modes by adaptively selecting encoding bit rate modes to match communication channel restrictions. In higher bit rate encoding modes, an accurate representation of speech through CELP (code excited linear prediction) and other associated modeling parameters are generated for higher quality decoding and reproduction. A speech encoder employing various encoding schemes based upon parameters including an available transmission bit rate. In addition, the speech encoder is operable to identify and apply an optimal encoding scheme for a given speech signal. The speech encoder may be applied code-excited linear prediction when the available bit rate is above a predetermined upper threshold. Pitch preprocessing, including continuous warping, may be applied when it is below a predetermined lower threshold. The encoder considers varying characteristics of the speech signal including the long term prediction mode of a previous frame, and a spectral difference between the line spectral frequencies of a current and a previous frame, a predicted pitch lag, an open loop pitch lag, a closed loop pitch lag, a pitch gain, and a pitch correlation.

    摘要翻译: 多速率语音编解码器通过自适应地选择编码比特率模式以匹配通信信道限制来支持多种编码比特率模式。 在较高的比特率编码模式中,通过CELP(码激励线性预测)和其他相关联的建模参数的语音的精确表示被生成用于更高质量的解码和再现。 一种基于包括可用传输比特率的参数的各种编码方案的语音编码器。 此外,语音编码器可操作以识别并应用给定语音信号的最佳编码方案。 当可用比特率高于预定的上限阈值时,语音编码器可以应用码激励线性预测。 当低于预定的下限阈值时,可以应用间距预处理,包括连续翘曲。 编码器考虑包括前一帧的长期预测模式的语音信号的变化特性,以及当前和前一帧的线谱频率之间的频谱差,预测音调滞后,开环音调滞后,闭合 环路音调滞后,音调增益和音调相关性。

    Adaptive multi-microphone beamforming

    公开(公告)号:US10366701B1

    公开(公告)日:2019-07-30

    申请号:US15681395

    申请日:2017-08-20

    申请人: Huan-Yu Su

    发明人: Huan-Yu Su

    摘要: Provided is a method and computer program product for producing an enhanced audio signal for an output device from audio signals received by 2 or more microphones in close proximity to each other. For example, one embodiment of the present invention comprises the steps of receiving a first input audio signal from the first microphone, digitizing the first input audio signal to produce a first digitized audio input signal, receiving a second input audio input signal from the second microphone, digitizing the second input audio input signal to produce a second digitized audio input signal, using the first digitized audio input signal as a reference signal to an adaptive prediction filter, using the second digitized audio input signal as input to said adaptive prediction filter and finally adding a prediction result signal from the adaptive prediction filter to the first digitized audio input signal to produce the enhanced audio signal. In other embodiments, any number of microphones can be used, and in all embodiments there is no requirement to detect or locate the source or direction of arrival of the input audio signals.

    Detecting and reporting a loss of connection by a telephone
    25.
    发明授权
    Detecting and reporting a loss of connection by a telephone 有权
    通过电话检测和报告连接丢失

    公开(公告)号:US07796623B2

    公开(公告)日:2010-09-14

    申请号:US12384019

    申请日:2009-03-30

    IPC分类号: H04L12/28 H04M3/22

    摘要: There is provided a method of detecting and reporting poor voice quality for use by a gateway device. The method comprises facilitating a connection between a telephone and a remote telephone via a network, and detecting a poor voice quality indictor during the connection. The method further comprises capturing, for a pre-determined period of time, telephone voice data being exchanged between the gateway and the telephone, network voice data being exchanged between the gateway and the network, and gateway parameters. The method also comprises packetizing the telephone voice data, the network voice data and the gateway parameters into a plurality packets having a network address of a network storage, and transmitting the plurality packets destined for the network storage via the network. In one aspect, the poor voice quality indictor may be generated by a user of the telephone in response to a poor voice quality of the connection.

    摘要翻译: 提供了一种检测和报告由网关设备使用的较差语音质量的方法。 该方法包括通过网络促进电话和远程电话之间的连接,以及在连接期间检测不良语音质量指示符。 该方法还包括:在预定时间段内,捕获在网关与电话之间交换的电话语音数据,网关和网络之间交换的网络语音数据以及网关参数。 该方法还包括将电话语音数据,网络语音数据和网关参数分组成具有网络存储器的网络地址的多个分组,并且经由网络发送去往网络存储的多个分组。 在一个方面,响应于连接的差的语音质量,可能由电话的用户产生差的语音质量指示符。

    Complexity resource manager for multi-channel speech processing
    26.
    发明授权
    Complexity resource manager for multi-channel speech processing 有权
    用于多声道语音处理的复杂性资源管理器

    公开(公告)号:US07080010B2

    公开(公告)日:2006-07-18

    申请号:US10911118

    申请日:2004-08-03

    IPC分类号: G10L19/02

    CPC分类号: G10L15/285

    摘要: A multi-channel speech processor for encoding speech in a packet network environment is disclosed. In one illustrative aspect, a complexity resource manager (CRM) is executed by a controller or processor. The CRM manages the level of complexity of encoding which is used by a signal processing unit (SPU) to convert the speech signal into packet data. In general, the CRM determines the level of complexity of encoding based on a calculated complexity budget, where the complexity budget is determined based on the time required to process prior speech signal channels and the time available to process the remaining channels. In this way, the CRM is able to control the overall complexity of the speech processor through its ability to signal the SPU to encode speech signal in a complexity reduced mode based on the calculated complexity budget under certain conditions.

    摘要翻译: 公开了一种用于在分组网络环境中编码语音的多声道语音处理器。 在一个说明性方面,复杂性资源管理器(CRM)由控制器或处理器执行。 CRM管理由信号处理单元(SPU)用于将语音信号转换成分组数据的编码的复杂程度。 通常,CRM基于计算的复杂度预算确定编码的复杂程度,其中基于处理先前语音信号信道所需的时间和可用于处理剩余信道的时间来确定复杂度预算。 以这种方式,CRM能够通过其在特定条件下基于计算的复杂度预算在复杂度降低模式下对SPU进行信号编码语音信号的能力来控制语音处理器的总体复杂性。

    Using signal to noise ratio of a speech signal to adjust thresholds for extracting speech parameters for coding the speech signal
    27.
    发明授权
    Using signal to noise ratio of a speech signal to adjust thresholds for extracting speech parameters for coding the speech signal 有权
    使用语音信号的信噪比来调整用于提取用于编码语音信号的语音参数的阈值

    公开(公告)号:US06898566B1

    公开(公告)日:2005-05-24

    申请号:US09640841

    申请日:2000-08-16

    摘要: There are provided speech coding methods and systems for estimating a plurality of speech parameters of a speech signal for coding the speech signal using one of a plurality of speech coding algorithms, the plurality of speech parameters includes pitch information, the plurality of speech parameters is calculated using a plurality of thresholds. An example method includes estimating a background noise level in the speech signal to determine a signal to noise ratio (SNR) for the speech signal, adjusting one or more of the plurality of thresholds based on the SNR to generate one or more SNR adjusted thresholds, analyzing the speech signal to extract the pitch information using the one or more SNR adjusted thresholds, and repeating the estimating, the adjusting and the analyzing to code the speech signal using one the plurality of speech coding algorithms.

    摘要翻译: 提供了语音编码方法和系统,用于使用多种语音编码算法中的一种来估计用于对语音信号进行编码的语音信号的多个语音参数,所述多个语音参数包括音调信息,所述多个语音参数被计算 使用多个阈值。 示例性方法包括估计语音信号中的背景噪声电平以确定语音信号的信噪比(SNR),基于SNR调整多个阈值中的一个或多个阈值以产生一个或多个SNR调整阈值, 分析语音信号以使用一个或多个SNR调整的阈值提取音调信息,并且使用多个语音编码算法中的一个重复对该语音信号的估计,调整和分析。

    Flexible variable rate vocoder for wireless communication systems
    28.
    发明授权
    Flexible variable rate vocoder for wireless communication systems 有权
    用于无线通信系统的灵活可变速率声码器

    公开(公告)号:US06856954B1

    公开(公告)日:2005-02-15

    申请号:US09627375

    申请日:2000-07-28

    申请人: Huan-Yu Su

    发明人: Huan-Yu Su

    CPC分类号: H04L1/0014

    摘要: A flexible variable rate vocoder and related method of operation. The vocoder selects a target average data rate responsive to at least one network parameter and at least one external parameter.

    摘要翻译: 灵活的可变速率声码器及相关操作方法。 声码器响应于至少一个网络参数和至少一个外部参数来选择目标平均数据速率。

    Intelligent discontinuous transmission and comfort noise generation scheme for pulse code modulation speech coders
    29.
    发明授权
    Intelligent discontinuous transmission and comfort noise generation scheme for pulse code modulation speech coders 有权
    用于脉码调制语音编码器的智能不连续传输和舒适噪声生成方案

    公开(公告)号:US06510409B1

    公开(公告)日:2003-01-21

    申请号:US09484731

    申请日:2000-01-18

    申请人: Huan-Yu Su

    发明人: Huan-Yu Su

    IPC分类号: G10L1102

    CPC分类号: G10L19/012 G10L25/78

    摘要: A fully backward compatible intelligent discontinued transmission (DTX) and comfort noise generation (CNG) scheme that is operable in pulse code modulation (PCM) speech coding systems. The scheme, for example, provides a speech encoder comprising a speech signal analysis circuitry configured to calculates a predetermined plurality of parameters from the speech signal, a voice activity detector configured to determine voice activity in the speech signal, where the speech encoder enters a discontinued transmission mode of the voice activity detector does not detect voice activity, and a transmitter configured to transmit one or more speech samples of the speech signal after the speech encoder enters the discontinued transmission mode, where the one or more speech samples are capable of use by a remote speech decoder to extract a parameter from the one or more speech samples in order generate a background noise base on the parameter.

    摘要翻译: 完全向后兼容的智能中断传输(DTX)和舒适噪声生成(CNG)方案,其可在脉冲编码调制(PCM)语音编码系统中操作。 该方案例如提供了语音编码器,其包括语音信号分析电路,该语音信号分析电路经配置以从语音信号计算预定的多个参数;语音活动检测器,被配置为确定语音信号中的语音活动,其中语音编码器进入中断 语音活动检测器的传输模式不检测语音活动,并且发送器被配置为在语音编码器进入中断传输模式之后发送语音信号的一个或多个语音样本,其中一个或多个语音样本能够由 远程语音解码器,用于从一个或多个语音样本中提取参数,以便根据该参数产生背景噪声。

    Comb codebook structure
    30.
    发明授权
    Comb codebook structure 有权
    梳码簿结构

    公开(公告)号:US06330531B1

    公开(公告)日:2001-12-11

    申请号:US09156649

    申请日:1998-09-18

    申请人: Huan-Yu Su

    发明人: Huan-Yu Su

    IPC分类号: G10L1912

    摘要: A speech encoding comb codebook structure for providing good quality reproduced low bit-rate speech signals in a speech encoding system. The codebook structure requires minimal training, if any, and allows for reduced complexity and memory requirements. The codebook includes a first and at least one additional sub-codebooks, each having a plurality of code-vectors. The codebook may be randomly populated. All even elements may be set to zero in a first codebook, and all odd elements may be set to zero on a second codebook. The resulting comb codebook includes code-vector combination of the code-vectors from the sub-codebooks. In certain embodiments, the code-vectors of the sub-codebooks may contain zero valued elements. In other embodiments where the code-vectors of the sub-codebooks contain only non-zero elements, zero valued elements may be inserted in between the non-zero elements of the sub-codebooks during the forming of the resultant comb codebook. In such an embodiment, the memory requirements would be further reduced in that the zero valued elements need not be stored.

    摘要翻译: 一种用于在语音编码系统中提供良好质量的再现低比特率语音信号的语音编码梳状码本结构。 码本结构需要最少的培训(如果有的话),并允许降低复杂性和内存需求。 码本包括第一和至少一个附加的子码本,每个子码本具有多个码矢量。 码本可以随机填充。 所有偶数元素可以在第一码本中设置为零,并且所有奇数元素可以在第二码本上设置为零。 所得到的梳状码本包括来自子码本的码矢量的码矢量组合。 在某些实施例中,子码本的码矢可包含零值元素。 在其中子代码本的代码矢量仅包含非零元素的其他实施例中,在形成所生成的梳状码本的过程中,可将零值元素插入子码本的非零元素之间。 在这样的实施例中,存储器要求将进一步减少,因为不需要存储零值元素。