摘要:
In a formant emphasis method of emphasizing the formant as the spectral peak of an input speech signal and attenuating the spectral valley of the input speech signal, a spectrum emphasis filter performs processing for emphasizing the formant of the input speech signal and attenuating the valley of the input speech signal. A first-order variable characteristic filter whose characteristic adaptively changes in accordance with the characteristic of the input speech signal and a first-order fixed characteristic filter compensate a spectral tilt included in an output signal from the spectrum emphasis filter.
摘要:
A musical sound synthesizer generates a predetermined singing sound based on performance data. A compression device determines whether each of a plurality of phonemes forming the predetermined singing sound is a first phoneme to be sounded in accordance with a note-on signal indicative of a note-on of the performance data, and compresses a rise time of the first phoneme when the first phoneme is sounded in accordance with occurrence of the note-on signal of the performance data.
摘要:
A speech recognition system capable of recognizing a word or a plurality of words based on a continuous spelling of the word(s) by a user. The system includes a speech recognition engine with a decoder running in forward mode such that the recognition engine continuously outputs an updated string of hypothesized letters based on the letters uttered by the user. The system further includes a spelling engine for comparing each string of hypothesized letters to a vocabulary list of words. The spelling engine returns a best match for the string of hypothesized letters. The system may also include an early identification unit for presenting the user with the best matching word(s) possibly before the user has completed spelling the desired word(s).
摘要:
Synthetic speech is generated by production of a digital waveform from a text in phonemes. A linked database is used which comprises an extended text in phonemes and its equivalent in the form of a digital waveform. The two portions of the database are linked by a parameter which establishes equivalent points in both the phoneme text and the digital waveform. The input text (in phonemes) is analyzed to locate a matching portion in the phoneme portion of the database. This matching utilizes exact equivalence of phonemes where this is possible; otherwise relation between phonemes is utilized. The selection process identifies input phonemes in context whereby improved conversions are obtained. Having analyzed the input exit into matching strings in the input form of the database beginning and ending parameters for the sections are established. The output text is produced by abutting sections of the digital waveform and defined by the beginning and ending parameters.
摘要:
A signal extraction system for extracting one or more signal components from an input signal including a plurality of signal components. This system is equipped with a neural network arithmetic section designed to process information through the use of a recurrent neural network. The neural network arithmetic section extracts one or more signal components, for example, a speech signal component and a noise signal component from an input signal including a plurality of signal components such as a speech and noises and outputs the extracted signal components. Owing to the presence of this neural network arithmetic section, the signal extraction becomes possible with a high accuracy.
摘要:
A voice recording apparatus includes an encoding unit capable of encoding an input voice signal at different encoding bit rates. A system controller records the input voice signal encoded by the encoding unit on a memory and acquires information on at least one of a used recordable capacity and remaining recordable capacity of the memory for any of the encoding bit rates. A display unit displays the information in a single way or a plurality of different ways of representation.
摘要:
The present invention relates to a system and method of word syllabification. The present invention receives a word to be syllabified and determines therefrom all possible substrings capable of forming part of the word. Sequences matching at least part of or the whole of the word are determined from the substrings together with respective probabilities of occurrence and the sequence having the greatest probability of occurrence is selected as being the most probable syllabification of the word. The most probable sequence can be determined in many different ways. For example, the sequence can be determined by commencing with the substring having the greatest probability of forming the beginning of a given word and subsequently traversing in a step-by-step manner a table comprising all possible substrings of the word and at each step selecting the next substring of the sequence according to which of the possible next substrings has the highest probability of occurrence. A further method of determining the most probable sequence would be to adopt the above step-by-step approach for all possible substrings capable of forming the beginning of the given word. Alternatively, all possible sequences of substring capable of constituting the word can be determined together respective probabilities of occurrence thereof and the sequence having the highest respective probability of occurrence is selected as being the most probable syllabification of the given word.
摘要:
A speech recognition method that combines HMMs and vector quantization to model the speech signal and adds spectral derivative information in the speech parameters. Each state of a HMM is modeled by two different VQ-codebooks. One is trained by using the spectral parameters and the second is trained by using the spectral derivative parameters.
摘要:
The present invention pertains to a concatenative speech synthesis system and method which produces a more natural sounding speech. The system provides for multiple instances of each acoustic unit which can be used to generate a speech waveform representing an linguistic expression. The multiple instances are formed during an analysis or training phase of the synthesis process and are limited to a robust representation of the highest probability instances. The provision of multiple instances enables the synthesizer to select the instance which closely resembles the desired instance thereby eliminating the need to alter the stored instance to match the desired instance. This in essence minimizes the spectral distortion between the boundaries of adjacent instances thereby producing more natural sounding speech.
摘要:
Methods and apparatus for a language model and language recognition systems are disclosed. The method utilizes a plurality of probabilistic finite state machines having the ability to recognize a pair of sequences, one sequence scanned leftwards, the other scanned rightwards. Each word in the lexicon of the language model is associated with one or more such machines which model the semantic relations between the word and other words. Machine transitions create phrases from a set of word string hypotheses, and incrementally calculate costs related to the probability that such phrases represent the language to be recognized. The cascading lexical head machines utilized in the methods and apparatus capture the structural associations implicit in the hierachical organization of a sentence, resulting in a language model and language recognition systems that combine the lexical sensitivity of N-gram models with the structural properties of dependency grammar.