Abstract:
A simulated speech method and apparatus afford a dual matrix presentation (32 or 34) of simulated speech as encoded spatial patterns representing speech phonemes and the characteristic mouth formations that produce the phonemes. Spatial patterns (Fig. 3) may be presented in either tactile or visual form, or both, from the output of a microcomputer speech analyzer (22) that analyzes speech in real time, from a keyboard (50) that generates phoneme- and mouth form- representing signals, or from a memory device (44) that reproduces pre-recorded spatial patterns. The speech analyzer may be incorporated into an armband (90) with a pair of tactile stimulator matrices (36, 38) to provide an unobtrusive prosthetic device for hearing-handicapped individuals. A modified 16 mm projector (240) records spatial patterns on punched film (260) and projects the patterns onto a display (244) to provide a visual presentation.
Abstract:
The invention is associated with speech synthesis and with the producing of speech by electronic methods. The object of the invention is to create a new method e.g. for the modelling of the human speech mechanism's acoustic characteristics, i.e., of speech producing. The acoustic transfer function modelling the sound channel is approximated by subdividing it by mathematical methods into partial transfer functions of simpler spectral structure. Each partial transfer function is separately approximated by realizable rational transfer functions. The last mentioned rational transfer functions are realized, each separately, by means of equivalent electrical filters, which have been interconnected in parallel and/or series in the manner implied by the acoustic transfer function which is to be modelled. The models produced by the method of the invention may also be utilized in speech identification, in the estimation of the parameters of a speech signal and in so-called Vocoder apparatus. The invention is also applicable in electronic music synthesizers.
Abstract:
A high-quality, real-time text-to-speech synthesizer system (Fig. 1) handles an unlimited vocabulary with a minimum of hardware by using a microcomputer-software-compatible time domain methodology which requires a minimum of memory and computational power. The system first compares text words to an exception dictionary (Fig. 2). If the word is not found therein, the system applies standard pronunciation rules to the text word. In either instance, the text word is converted to a phoneme sequence. By the use of look-up tables addressed by pointers contained in a phoneme-and-transition matrix (Fig. 3), the synthesizer translates the sequence of phonemes and transitions therebetween into sequences of small speech segments capable of being expressed in terms of repetitions of variable-length portions of short digitally stored waveforms. In general, unvoiced transitions are produced by a sequence of segments which can be concatenated in forward or reverse order to generate different transitions out of the same segments; while voiced transitions are produced by interpolating adjacent phonemes for additioanl savings. Pitch can be varied for naturalness of sound, and/or for intonation changes derived from key words and/or punctuation in the text, by truncating or extending the waveforms of individual voice periods corresponding to voiced segments.
Abstract:
A speech recognition method and apparatus employ a speech processing circuity (26) for repetitively deriving from a speech imput (100), at a frame repetition rate, a plurality of acoustic parameters. The acoustic parameters represent the speech input signal for a frame time. A plurality of template matching and cost processing circuitries (28, 30) are connected to a system bus (24), along with the speech processing circuity, for determining, or identifying, the speech units in the input speech, by comparing the acoustic parameters with stored template patterns. The apparatus can be expanded by adding more template matching and cost processing circuity to the bus thereby increasing the speech recognition capacity of the apparatus. The speech processing circuity establishes overlapping time durations for generating the acoustic parameters and further employs a sinc-Kaiser smoothing function in combination with a folding technique (113) for providing a discrete Fourier transform (112). The Fourier spectra are transformed using a principal component analysis (122) which optimizes the across class variance. The template matching and cost processing circuitries (28, 30) provide distributed processing, on demand, of the acoustic parameters for generating through a dynamic programming technique the recognition decision. Grammar and word model syntax structures reduce the computational load. Template pattern generation is aided by using a "joker" word to specify the time boundaries of utterances spoken in isolation.
Abstract:
A very small, very flexible, high-quality, linear predictive vocoder has been implemented with commercially available integrated circuit. This fully digital realization is based on a distributed signal processing architecture employing three commercial Signal Processing Interface (SPI) single chip microcomputers. One SPI implements a linear predictive speech analyzer (18), a second implements a pitch analyzer (20), while the third implements the excitation generator and synthesizer (28).
Abstract:
A speech analyzer for recognizing an unknown utterance as one of a set of reference words is adapted (by 118, 119, 103) to generate (105) a feature signal set for each utterance of every reference word. At least one template signal is produced for each reference word which template signal (in 116) is representative of a group of feature signal sets. Responsive to a feature signal set formed (by 105) from the unknown utterance and each reference word template signal, a signal representative of the similarity between the unknown utterance and the template signal is generated (122). A plurality of similarity signals for each reference word is selected and a signal corresponding to the average of said selected similarity signals is formed (135). The average similarity signals are compared to identify the unknown utterance as the most similar reference word (145). Features of the invention include, template formation by successive clustering involving partitioning feature signal sets into groups of predetermined similarity by center point clustering, and recognition by comparing the average of selected similarity measures of a time-warped unknown feature signal set with the cluster-derived reference templates for each vocabulary word.
Abstract:
Apparatus for coding (Fig. 2A) an original speech signal having a waveform, including a waveform coder (14) operative at a low bit rate, for waveform coding the original speech signal to produce a coded signal having distortion, and an adaptive spectral shaping filter (80, 84) for filtering the distortion in the speech signal. The waveform coder has waveform coding data and the filter has filter coefficient data that are used by a decoding apparatus (Fig. 2B) to reconstruct the original speech signal. Also disclosed are various embodiments of speech analyzers and speech synthesizers which are implemented based on the coding and decoding principles of the coding decoding apparatus.
Abstract:
A method of and apparatus for processing audio signals in which a measure of amplitude of audio signals in a selected time period is obtained. The audio signals (Fig. 1) for the selected time period are delayed (18) until the measure of amplitude (16) is obtained, and then the delayed audio signals are normalized (20) using the measure of amplitude. High frequency emphasis (14) may be employed prior to obtaining the measure of amplitude. Alternatively, a multi-channel system (Fig. 3) can be employed for processing audio signals in limited frequency bands (32, 34, 36). The method and apparatus are applicable in a variety of applications including hearing aids, audio storage media, broadcast and public address systems, and voice communications such as telephone systems.
Abstract:
A method and apparatus for determining concordance between an analysis signal with at least one reference signal. The smoothed signals or envelopes are formed of a signal which is to be examined and at least one reference signal. A pulse train is generated for each of the smoothed signals or envelopes, and comprises pulses present during the time for a predetermined polarity of the smoothed signal or envelopes relative a threshold signal. The pulse trains are compared at regular intervals with each other, simultaneous coincidence of pulses in both trains at a predetermined number of consecutive comparisons constituting the criterion that the signals which are to be examined are in concordance with the reference signal. For carrying out this signal examination there are means (2, 4) adapted for forming said smoothed signals or envelopes of the signals. The outputs from said means are connected to one input of a comparator (6) for each signal. A threshold signal (URef1, URef2) is applied to the second input of the comparators for generating an output pulse train comprising pulses generated at a predetermined polarity of said smoothed signals or envelopes relative the threshold signals. Means (12, 14, 16, 18) are further adapted for comparing the pulse trains and registering simultaneous coincidence of pulses in the two trains for a predetermined number of consecutive comparisons as the criterion that the examined signal and a given reference signal are in concordance. The invention is primarily intended for utilizing in automatic debiting of utilized video service in a hotel or the like. The invention may also be used for monitoring processes or machines which have specific sound spectra for faults.
Abstract:
Un synthetiseur vocal servant au montage et a la synthese de segments d'elements sonores extraits d'une forme d'onde vocale analogique, qui convertit un signal vocal analogique en un signal numerique, decale relativement les donnees a proximite de l'extremite posterieure du segment d'element sonore precedent et les donnees a proximite de l'extremite du segment d'element sonore suivant au moyen d'un organe de commande arithmetique servant a calculer le degre d'analogie et extrait de maniere cadencee de la memoire les donnees concernant le segment d'element sonore suivant de maniere que ce segment d'element sonore suivant soit relie de la maniere la plus continue au segment d'element sonore precedent. Par consequent, la variation brusque dans la forme d'onde produite au connecteur entre le segment d'element sonore precedent et le segment d'element sonore suivant, c'est-a-dire, le bruit a haute frequence base sur la discontinuite de la forme d'onde, la deterioration du rapport signal/bruit du son synthetise et la deterioration de l'articulation peuvent etre pratiquement eliminees, et l'on peut obtenir un son synthetise ne presentant pas de forme d'onde discontinue ni de variation de la frequence du son au connecteur.