Multi-segment vector quantizer for a speech coder suitable for use in a
radiotelephone
    11.
    发明授权
    Multi-segment vector quantizer for a speech coder suitable for use in a radiotelephone 失效
    适用于无线电话机的语音编码器的多段矢量量化器

    公开(公告)号:US5675702A

    公开(公告)日:1997-10-07

    申请号:US611608

    申请日:1996-03-08

    摘要: A Vector-Sum Excited Linear Predictive Coding (VSELP) speech coder provides improved quality and reduced complexity over a typical speech coder. VSELP uses a codebook which has a predefined structure such that the computations required for the codebook search process can be significantly reduced. This VSELP speech coder uses single or multi-segment vector quantizer of the reflection coefficients based on a Fixed-Point-Lattice-Technique (FLAT). Additionally, this speech coder uses a pre-quantizer to reduce the vector codebook search complexity and a high-resolution scalar quantizer to reduce the amount of memory needed to store the reflection coefficient vector codebooks. Resulting in a high quality speech coder with reduced computations and storage requirements.

    摘要翻译: 矢量和激励线性预测编码(VSELP)语音编码器通过典型的语音编码器提供了改进的质量和降低的复杂度。 VSEL​​P使用具有预定义结构的码本,使得可以显着减少码本搜索处理所需的计算。 该VSELP语音编码器使用基于固定点格式技术(FLAT)的反射系数的单段或多段矢量量化器。 此外,该语音编码器使用预量化器来减少矢量码本搜索复杂度和高分辨率标量量化器,以减少存储反射系数矢量码本所需的存储量。 产生了一个高质量的语音编码器,减少了计算和存储要求。

    Method for generating a spectral noise weighting filter for use in a
speech coder
    12.
    发明授权
    Method for generating a spectral noise weighting filter for use in a speech coder 失效
    用于产生用于语音编码器的频谱噪声加权滤波器的方法

    公开(公告)号:US5570453A

    公开(公告)日:1996-10-29

    申请号:US434868

    申请日:1995-05-04

    CPC分类号: G10L19/12

    摘要: A digital speech coding method uses an Rth-order filter to model the frequency response of multiple filters, thereby, providing a filter which offers the control of multiple filters without the complexity of multiple filters. The Rth-order filter can be used as a spectral noise weighting filter or a combination of a short-term predictor filter and a spectral noise weighting filter, referred to as the spectrally noise weighted synthesis filter, depending on which embodiment is employed. In general, the method models the frequency response of L Pth-order filters by a single Rth-order filter, where the order R

    摘要翻译: 数字语音编码方法使用R阶滤波器对多个滤波器的频率响应进行建模,从而提供一个滤波器,其提供多个滤波器的控制,而不需要多个滤波器的复杂性。 根据使用哪个实施例,R阶滤波器可以用作频谱噪声加权滤波器或短期预测器滤波器和被称为频谱噪声加权合成滤波器的频谱噪声加权滤波器的组合。 通常,该方法通过单个R阶滤波器对L P阶滤波器的频率响应建模,其中阶数R

    Word spotting in a speech recognition system without predetermined
endpoint detection
    13.
    发明授权
    Word spotting in a speech recognition system without predetermined endpoint detection 失效
    在没有预定端点检测的语音识别系统中的字检测

    公开(公告)号:US5023911A

    公开(公告)日:1991-06-11

    申请号:US266293

    申请日:1988-10-31

    申请人: Ira A. Gerson

    发明人: Ira A. Gerson

    IPC分类号: G10L15/00 G10L15/10

    CPC分类号: G10L15/10 G10L2015/088

    摘要: Word spotting in a speech recognition system without predetermining the endpoints of the input speech. The invention is intended to be implemented in a system which has word templates stored in template memory, with the system being capable of accumulating distance measures for states within each word template. The following steps are used to generate a measure of similarity between a subset of the input frames and a word template. The steps are: a) recording a beginning input frame number for each state to identify the potential beginning of the word; b) accumulating distance measures for at least one state for each input frame; c) normalizing the distance measures by substracting a normalization amount from each distance measure; d) recording normalization information corresponding to the normalization amount for each input frame; and e) determining a similarity measure between the word template and a subset of input frames after a given input frame has been processed. The subset is identified from the beginning input frame number corresponding to an end state of the template, through the given input frame number. The similarity measure is based on the normalized distance measure recorded for the end state. and the normalization information.

    摘要翻译: 语音识别系统中的词识别,而不预先确定输入语音的端点。 本发明旨在在具有存储在模板存储器中的单词模板的系统中实现,系统能够累积每个单词模板内的状态的距离度量。 以下步骤用于生成输入帧的子集与单词模板之间的相似性度量。 步骤是:a)记录每个状态的开始输入帧号,以识别单词的潜在开始; b)为每个输入帧累积至少一个状态的距离度量; c)通过从每个距离测量中减去归一化量来对距离进行归一化; d)记录对应于每个输入帧的归一化量的归一化信息; 以及e)在给定的输入帧被处理之后,确定所述单词模板和输入帧的子集之间的相似性度量。 通过给定的输入帧号,从对应于模板的结束状态的起始输入帧号识别该子集。 相似性度量是基于为结束状态记录的归一化距离度量。 和归一化信息。

    Digital speech coder having optimized signal energy parameters
    14.
    发明授权
    Digital speech coder having optimized signal energy parameters 失效
    数字语音编码器具有优化的信号能量参数

    公开(公告)号:US5490230A

    公开(公告)日:1996-02-06

    申请号:US361474

    申请日:1994-12-22

    CPC分类号: G10L19/083 G10L19/125

    摘要: A speech coder and decoder methodology wherein pitch excitation and codebook excitation source energies are represented by parameters that are readily transmissible with minimal transmission capacity requirements. The parameters are the long term energy value, a short term correction factor which is applied to the long term energy value to match the short term energy, and proportionality factor(s) that specify the relative energy contribution of the excitation sources to the short term energy value.

    摘要翻译: 语音编码器和解码器方法,其中音调激励和码本激励源能量由在最小传输容量要求下容易传播的参数表示。 参数是长期能量值,应用于长期能量值以匹配短期能量的短期校正因子,以及规定激励源与短期能量的相对能量贡献的比例因子 能量值。

    Digital speech coder having improved sub-sample resolution long-term
predictor
    15.
    发明授权
    Digital speech coder having improved sub-sample resolution long-term predictor 失效
    具有改进的子样本分辨率长期预测器的数字语音编码器

    公开(公告)号:US5359696A

    公开(公告)日:1994-10-25

    申请号:US214998

    申请日:1994-03-21

    IPC分类号: G10L19/00 G10L19/12 G10L9/18

    CPC分类号: G10L19/12

    摘要: A digital speech coder includes a long-term filter (124) having an improved sub-sample resolution long-term predictor (FIG. 5 ) which allows for subsample resolution for the lag parameter L. A frame of N samples of input speech vector s(n) is applied to an adder (510). The output of the adder (510) produces the output vector b(n) for the long term filter (124). The output vector b(n) is fed back to a delayed vector generator block (530) of the long-term predictor. The nominal long-term predictor lag parameter L is also input to the delayed vector generator block (530). The long-term predictor lag parameter L can take on non-integer values, which may be multiples of one half, one third, one fourth or any other rational fraction. The delayed vector generator (530) includes a memory which holds past samples of b(n). In addition, interpolated samples of b(n) are also calculated by the delayed vector generator (530) and stored in its memory, at least one interpolated sample being calculated and stored between each past sample of b(n). The delayed vector generator (530) provides output vector q(n) to the long-term multiplier block (520), which scales the long-term predictor response by the long-term predictor coefficient .beta.. The scaled output .beta.q(n) is then applied to the adder (510) to complete the feedback loop of the recursive filter (124).

    摘要翻译: 数字语音编码器包括具有改进的子样本分辨率长期预测器(图5)的长期滤波器(124),其允许用于滞后参数L的子样本分辨率。输入语音向量s的N个样本的帧 (n)被施加到加法器(510)。 加法器(510)的输出产生用于长期滤波器(124)的输出向量b(n)。 输出向量b(n)被反馈给长期预测器的延迟向量生成器块(530)。 标称长期预测器滞后参数L也被输入到延迟向量发生器块(530)。 长期预测器滞后参数L可以采用非整数值,其可以是二分之一,三分之一,四分之一或任何其他有理分数的倍数。 延迟向量生成器(530)包括保存b(n)的过去样本的存储器。 另外,b(n)的内插样本也由延迟矢量发生器(530)计算并存储在其存储器中,至少一个内插样本被计算并存储在每个过去的样本b(n)之间。 延迟向量生成器(530)向长期乘法器块(520)提供输出向量q(n),长期乘数块(520)通过长期预测器系数β来缩放长期预测器响应。 然后将缩放的输出βq(n)加到加法器(510)以完成递归滤波器(124)的反馈回路。

    Decoder for convolutionally encoded information
    16.
    发明授权
    Decoder for convolutionally encoded information 失效
    解码器用于卷积编码信息

    公开(公告)号:US5229767A

    公开(公告)日:1993-07-20

    申请号:US755266

    申请日:1991-09-05

    IPC分类号: H03M13/41

    CPC分类号: H03M13/41

    摘要: In a Viterbi Algorithm decoder (204) as used to decode convolutionally encoded information, reliability information is developed for various path discard decisions made within the Viterbi Algorithm. These decisions are made for discard opportunities that impact one or more error detection windows (601). Based upon these metrics, a reliability factor sequence can be provided and compared against a fixed (or varying) threshold. When unreliability appears, appropriate action can be taken. For example, all of the information can be discarded, or only certain portions of the information can be discarded, as appropriate to the particular application.

    摘要翻译: 在用于对卷积编码信息进行解码的维特比算法解码器(204)中,针对在维特比算法中进行的各种路径丢弃决策开发可靠性信息。 这些决定用于影响一个或多个错误检测窗口(601)的丢弃机会。 基于这些度量,可以提供可靠性因子序列并将其与固定(或变化)的阈值进行比较。 当出现不可靠时,可以采取适当的措施。 例如,根据特定应用,所有的信息都可以被丢弃,或只能够丢弃信息的某些部分。

    Continuous speech recognition system
    17.
    发明授权
    Continuous speech recognition system 失效
    连续语音识别系统

    公开(公告)号:US5040127A

    公开(公告)日:1991-08-13

    申请号:US357071

    申请日:1989-03-29

    申请人: Ira A. Gerson

    发明人: Ira A. Gerson

    CPC分类号: G10L15/193 G10L15/14

    摘要: A continuous speech recognition system employs a grammar tree of alternative potentially recognized word paths. A technique of tracing back through the grammar tree is utilized in determining which partial word path is common to all potential word paths. The common partial word path is deleted and words corresponding to the deleted partial word path are output as recognized words.

    摘要翻译: PCT No.PCT / US86 / 01224 Sec。 371日期:1988年9月30日 102(e)日期1988年9月30日PCT Filed 1986年6月2日PCT公布。 公开号WO87 / 07749 1987年12月17日。连续语音识别系统采用替代潜在识别的词路径的语法树。 通过语法树追溯的技术被用于确定哪个部分字路径对于所有潜在的字路径是共同的。 公共部分字路径被删除,并且与删除的部分字路径相对应的字作为识别字输出。

    Digital speech coder having improved vector excitation source
    18.
    发明授权
    Digital speech coder having improved vector excitation source 失效
    具有改进的矢量激励源的数字语音编码器

    公开(公告)号:US4896361A

    公开(公告)日:1990-01-23

    申请号:US294098

    申请日:1989-01-06

    申请人: Ira A. Gerson

    发明人: Ira A. Gerson

    IPC分类号: G10L19/00 G10L19/12

    CPC分类号: G10L19/135 G10L25/06

    摘要: An improved excitation vector generation and search technique (FIG. 1) is described for a code-excited linear prediction (CELP) speech coder (100) using a codebook memory of excitation code vectors. A set of M basis vectors v.sub.m (n) are used along with the excitation signal codewords (i) to generate the codebook of excitation vectors u.sub.i (n) according to a "vector sum" technique (120) of converting stored selector codewords into a plurality of interim data signals, multiplying the set of M basis vectors by the interim data signals, and summing the resultant vectors to produce the set of 2.sup.M codebook vectors. Only M basis vectors need to be stored in memory (114), as opposed to all 2.sup.M code vectors.

    摘要翻译: 针对使用激励码矢量的码本存储器的码激励线性预测(CELP)语音编码器(100)描述了改进的激励矢量生成和搜索技术(图1)。 一组M个基矢量vm(n)与激励信号码字(i)一起使用,以根据“矢量和”技术(120)生成激励矢量ui(n)的码本,以将存储的选择码字转换为 多个中间数据信号,将M个基矢量的集合乘以中间数据信号,并将所得到的矢量相加以产生一组2M码本矢量。 与所有2M代码矢量相反,只有M个基矢量需要存储在存储器(114)中。

    Method for entering digit sequences by voice command
    19.
    发明授权
    Method for entering digit sequences by voice command 失效
    通过语音命令输入数字序列的方法

    公开(公告)号:US4870686A

    公开(公告)日:1989-09-26

    申请号:US110144

    申请日:1987-10-19

    CPC分类号: H04M1/271 G10L15/22

    摘要: A user-interactive speech recognition control system is disclosed for recognizing a complete sequence of keywords (e.g., a telephone number such as 123-4567) via entering, verifying, and editing variable-length utterance strings (e.g., 1-2-3; 4-5; 6-7) separated by the user-defined placement of pauses. The device controller (120) utilizes timers (124) to monitor the pause time between partial-sequence digit strings recognized by the speech recognizer (110). When a string of digits is followed by a predetermined pause time interval, the recognized digits will be replied via the speech synthesizer (130). An additional string of digits can then be entered, and only the subsequent string will be replied after the next pause. Furthermore, the user has the flexibility to correct only the last digit string entered, or the entire sequence. Hence, if there is an error in only one digit, the erroneous digit string can be corrected without having to re-enter the entire digit sequence. The invention is well-suited to be used in a hands-free voice command dialing system for a mobile radiotelephone, wherein vehicular background noise may affect recognition accuracy.

    摘要翻译: 公开了一种用户交互式语音识别控制系统,用于通过输入,验证和编辑可变长度发音串(例如,1-2-3; ...)来识别完整的关键词序列(例如,电话号码,例如123-4567) 4-5; 6-7)由用户定义的暂停位置分隔开。 设备控制器(120)利用定时器(124)监视由语音识别器(110)识别的部分序列数字串之间的暂停时间。 当一串数字后跟预定的暂停时间间隔时,识别的数字将经由语音合成器(130)被回复。 然后可以输入一个额外的数位字符串,只有后续的字符串将在下一个暂停之后被回复。 此外,用户具有仅校正输入的最后一个数字串或整个序列的灵活性。 因此,如果只有一个数字有错误,则可以校正错误的数字串,而无需重新输入整个数字序列。 本发明非常适用于用于移动无线电话的免提语音指令拨号系统,其中车载背景噪声可能影响识别精度。

    Method and apparatus for processing an input speech signal during presentation of an output audio signal
    20.
    发明授权
    Method and apparatus for processing an input speech signal during presentation of an output audio signal 有权
    用于在呈现输出音频信号期间处理输入语音信号的方法和装置

    公开(公告)号:US06937977B2

    公开(公告)日:2005-08-30

    申请号:US09412202

    申请日:1999-10-05

    申请人: Ira A. Gerson

    发明人: Ira A. Gerson

    摘要: A start of an input speech signal is detected during presentation of an output audio signal and an input start time, relative to the output audio signal, is determined. The input start time is then provided for use in responding to the input speech signal. In another embodiment, the output audio signal has a corresponding identification. When the input speech signal is detected during presentation of the output audio signal, the identification of the output audio signal is provided for use in responding to the input speech signal. Information signals comprising data and/or control signals are provided in response to at least the contextual information provided, i.e., the input start time and/or the identification of the output audio signal. In this manner, the present invention accurately establishes a context of an input speech signal relative to an output audio signal regardless of the delay characteristics of the underlying communication system.

    摘要翻译: 在输出音频信号的呈现期间检测输入语音信号的开始,并且确定相对于输出音频信号的输入开始时间。 然后提供输入开始时间以用于响应于输入语音信号。 在另一实施例中,输出音频信号具有相应的标识。 当在输出音频信号的呈现期间检测到输入语音信号时,提供输出音频信号的识别以用于响应输入的语音信号。 响应于至少提供的上下文信息,即输入开始时间和/或输出音频信号的识别,提供包括数据和/或控制信号的信息信号。 以这种方式,无论底层通信系统的延迟特性如何,本发明都可以准确地建立输出语音信号相对于输出音频信号的上下文。