摘要:
A start of an input speech signal is detected during presentation of an output audio signal and an input start time, relative to the output audio signal, is determined. The input start time is then provided for use in responding to the input speech signal. In another embodiment, the output audio signal has a corresponding identification. When the input speech signal is detected during presentation of the output audio signal, the identification of the output audio signal is provided for use in responding to the input speech signal. Information signals comprising data and/or control signals are provided in response to at least the contextual information provided, i.e., the input start time and/or the identification of the output audio signal. In this manner, the present invention accurately establishes a context of an input speech signal relative to an output audio signal regardless of the delay characteristics of the underlying communication system.
摘要:
A wireless system comprises at least one subscriber unit in wireless communication with an infrastructure. Each subscriber unit implements a speech recognition client, and the infrastructure comprises a speech recognition server. A given subscriber unit takes as input an unencoded speech signal that is subsequently parameterized by the speech recognition client. The parameterized speech is then provided to the speech recognition server that, in turn, performs speech recognition analysis on the parameterized speech. Information signals, based in part upon any recognized utterances identified by the speech recognition analysis, are subsequently provided to the subscriber unit. The information signals may be used to control the subscriber unit itself; to control one or more devices coupled to the subscriber unit, or may be operated upon by the subscriber unit or devices coupled thereto.
摘要:
An Rth-order filter models the frequency response of multiple filters, to provide a filter which offers the control of multiple filters without the complexity of multiple filters. The Rth-order filter can be used as a spectral noise weighting filter or a combination of a short-term predictor filter and a spectral noise weighting filter, referred to as the spectrally noise weighted synthesis filter, depending on which embodiment is employed. In general, the method models the frequency response of L Pth-order filters by a single Rth-order filter, where the order R
摘要:
An adaptive spectral postfilter in a synthesized speech platform has a denominator characteristic that corresponds to a preceding LPC filter stage, and a numerator characteristic that is developed as a function of the denominator characteristic through application of spectral smoothing techniques. This allows the numerator to track the denominator without the introduction of spectral distortion that would otherwise affect the processing in an adverse way.
摘要:
An improved excitation vector generation and search technique (FIG. 1) is described for a code-excited linear prediction (CELP) speech coder (100) using a codebook of excitation code vectors. A set of M basis vectors V.sub.m (n) are used along with the excitation signal codewords (i) to generate the codebook of excitation vectors u.sub.i (n) according to a "vector sum" technique (120) of converting the selector codewords into a plurality of interim data signals, multiplying the set of M basis vectors by the interim data signals, and summing the resultant vectors to produce the set of 2.sup.M codebook vectors. The entire codebook of 2.sup.M possible excitation vectors is efficiently searched by using the vector sum generation technique with the M basis vectors--without ever having to generate and evaluate each of the 2.sup.M code vectors themselves. Furthermore, only M basis vectors need to be stored in memory (114), as opposed to all 2.sup.M code vectors.
摘要:
Described herein, is an arrangement and method for processing speech information in a speech recognition system (300). In such a system where the speech information is depicted as words, each word representing a sequence of frames (510) and where the recognition system has means (120) for comparing present input speech to a word template, the word template stored in template memory and derived from one or more previous input word, the present invention is best employed. The invention describes combining contiguous acoustically similar frames (512) derived from the previous input word or words into representative frames to form a corresponding reduced word template, storing the reduced word template in template memory in an efficient manner, and comparing frames of the present input speech to the representative frames of the reduced word template according to the number of frames combined in the representative frames of the reduced word template. In doing so, a measure of similarity between the present input speech and the word template is generated.
摘要:
Disclosed is a method for generating word templates for a speech recognition system. It is used where speech is represented by data in frames of equal time intervals. The method includes generating an interim template, generating a time alignment path between the interim template and a token, mapping frames from the interim template and the token along the time alignment path onto an averaged time axis, and combining data associated with the mapped frames to produce composite frames representative of the final word template. The method realizes advantages of reduced memory usage and a realistic data average from each contributing averaged word.
摘要:
An automatic gain selector is disclosed for use with a noise suppression system which performs speech quality enhancement upon a noisy speech signal available at the input to generate a noise-suppressed speech signal at the output by spectral gain modification. The channel gain controller (240) of the present invention produces a modification signal (245), comprised of individual channel gain values, for application to a channel gain modifier (250). A particular gain table set is automatically selected from one of a plurality of gain tables (450) by a selector switch (470) and a noise level quantizer (440) in response to a multi-channel noise parameter, such as the overall average background noise level of the input signal. Then the individual channel gain values (455) are obtained from the particular gain table set in response to the individual channel signal-to-noise ratio estimate (235). Hence, each individual channel gain value is selected as a function of (a) the channel number, (b) the current channel SNR estimate, and (c) the overall average background noise level. The automatic gain selector further includes a gain smoothing filter (460) for smoothing these noise suppression gain factors on a per-sample basis thereby improving noise flutter performance caused by step discontinuities in frame-to-frame gain changes.
摘要:
An improved method and means of determining reflection coefficients that characterize an electrical signal that obtains characteristics of an all-zero inverse lattice filter. The reflection coefficients are obtained by filtering the signal, sample the filtered signal, obtaining the elements of a correlation array from the samples, initializing values of arrays forward residuals, backward residuals, and cross correlation of residuals, combining array elements to obtain a first reflection coefficient, removing from the forward, backward and cross-correlation arrays the effect of the first reflection coefficient, calculating from the revised arrays a second coefficient, and repeating the calculations to the desired order. In a second embodiment of the present invention, samples are selected from the digitized signal and multiplied by a windowing function. The windowed samples are used to derive values of an autocorrelation array which eliminates the need for both forward and backward arrays as in the first embodiment of the invention.
摘要:
A digital speech coder utilizes harmonic noise weighting to overcome some limitations of low-rate CELP-type speech coders in reproducing voiced speech. In addition to a short term correction factor, which constitutes spectral noise weighting as known in the art, a long term pitch correction factor is utilized to provide harmonic noise weighting. The inclusion of harmonic noise weighting in a speech coder more efficiently utilizes noise-masking properties of a speech signal, allowing synthesis of a higher quality speech at a given bit rate.