摘要:
Disclosed is a method for drastically reducing the average error rate for signals under mismatched conditions. The method takes a signal (e.g., speech signal) and a set of stored representations (e.g., stored representations of keywords) and performs at least one transformation that results in the signal more closely emulating the stored representations. This is accomplished by using one of three techniques. First, one may transform the signal so that the signal may be better approximated by (e.g., is closer to) one of the stored representations. Second, one may transform the set of stored representations so that one of the stored representations better approximates the signal. Third, one may transform both the signal and the set of stored representations.
摘要:
Codebook vectors may be considered critical if they give poor energy approximations and exhibit a particular shape with smaller components near the beginning and larger components toward the end of the vector. Standard deviation may be used to identify critical codevectors based on energy approximation error measured in decibels. A low-bit rate (typically 8 kbit/s or less), low-delay digital coder and decoder based on Code Excited Linear Prediction for speech and similar signals features backward adaptive adjustment for codebook gain and short-term synthesis filter parameters and forward adaptive adjustment of long-term (pitch) synthesis filter parameters. In addition, the coder makes use of an excitation codebook and the coding is based on a set of codebook vector energies for a set of codebook vectors in the codebook. The codebook energies are calculated by identifying a set of approximations for the non-critical codebook vector energies. This achieves a significant reduction in processing time in comparison with prior art techniques.
摘要:
A plurality of transducers are positioned in front of a speaker's mouth for detecting and responding to air flow patterns in space and time. Specific examples for the system and method for speech analysis and recognition by the detection of air flow pattern in the proximity of the mouth in space and time during an utterance are provided.
摘要:
A signal processing system (50) performs real-time pitch shifting for applications such as karaoke, tapeless answering machines, and the like while minimizing distortion. A digital input signal is sampled and stored at successive locations in a variable-size buffer (62) at an input sample rate. Data from the variable-size buffer (62) is interpolated according to a pitch-shifting ratio. An adaptive pitch estimator (61) continually estimates the fundamental frequency of the digital input signal, and the signal processing system (50) adjusts the buffer size of the variable-size buffer (62) in response thereto. The signal processing system (50) changes the buffer size to store the digital input signal for an integral number of periods of the estimated fundamental frequency.
摘要:
The dictionary is broken into clusters by first grouping the dictionary according to a rule based procedure whereby the dictionary is sorted by word length and alphabetically. After sorting, a plurality of first cluster centers is generated by selecting the dictionary entries that differ from neighboring entries by the first letter. Each of the dictionary entries is then assigned to the closest one of the first cluster centers using a dynamic time warping procedure. These newly formed clusters are then each analyzed to find the true cluster center and the dictionary entries are then each assigned to the closest true cluster center. The clusters, so formed, may then be rapidly searched to locate any dictionary entry. The search is quite efficient because only the closest cluster to the desired dictionary entry needs to be searched.
摘要:
In a speech-recognition system having a plurality of classifiers, a voting window includes a sequence of outputs from each of the classifiers. For each classifier, a voting sum is generated corresponding to the voting window. A spoken sound is identified by determining which classifier corresponds to the greatest voting sum.
摘要:
An improved text-to-speech synthesizer that employs a text to speech converter, a text reader control procedure, a classifier procedure, an abbreviation expansion procedure, and an acronym/initialism expanding procedure is herein described. A classifier procedure is used to classify generate classification values for each word in the text message with regard to syntax, punctuation and membership in predefined classes of words, the predefined classes of words including number, measurement units, geographic designations, and date/time values. An abbreviation expansion procedure evaluates, based on the classification values for words neighboring the identified words, which, if any, of the potential expansion values is applicable, and substitutes the potential expansion for the identified abbreviation word when evaluation yields a success value. An acronym/initialism expanding procedure identifies words in the text message that are acronyms and initialisms, parses pronounceable syllables within the identified words and generates a substitute string that can consist of any combination of letters, numbers, pronounceable syllables or multiple letter identifiers.
摘要:
The procedure for the recognition of a speech signal until output of the recognized word sequence or the recognized sentence is split in accordance with the invention in such a manner that first only word hypotheses are separately generated for different starting instants and that from these word hypotheses preliminary word strings are formed in conformity with a word graph, the word graph thus arising being continuously optimized by erasure of parts of word strings. Parts of word strings having the same beginning and end points are compared with one another and the scores of words having concurrent end points are compared with a threshold value. Further steps for optimization of the word graph are also shown. For output disclosed a particularly effective post-editing operation where for each incorrect word all further words having the same beginning are output, enabling fast selection of the correct word from all said further words, by the operator.
摘要:
A frequency analysis method comprises using a window function to evaluate aemporal input signal present in the form of discrete sampled values. The windowed input signal is subsequently subjected to Fourier transformation for the purpose of generating a set of coefficients. In order to develop such a method so that the characteristics of the human ear are simulated not only with respect to the spectral projection in the frequency range, but also with respect to the resolution in the temporal range, a set of different window functions is used to evaluate a block of the input signal in order to generate a set of blocks, weighted with the respective window functions, of sampled values whose Fourier transforms have different bandwidths, before each of the simultaneously generated blocks of sampled values is subjected to a dedicated Fourier transformation in such a way that for each window function at least respectively one coefficient is calculated which is assigned the bandwidth of the Fourier transforms of this window function, and that the coefficients are chosen such that the frequency bands assigned to them essentially adjoin one another.
摘要:
A preferred report generating system includes a computer (12) responsive to user-spoken inputs for selecting previously defined report material including text and graphics stored in memory respectively corresponding to the inputs, for activating other user inputs, and implementing corresponding computer commands. After receipt of preferred user-spoken inputs entered by way of a microphone (16) representing information needed for generating a report, the system compiles the report material corresponding to the user-selected inputs for generating the report.