摘要:
A vocoder based voice recognizer recognizes a spoken word using linear prediction coding based vocoder data without completely reconstructing the voice data. The recognizer generates at least one energy estimate per frame of the vocoder data (60) and searches for word boundaries in the vocoder data (64) using the associated energy estimates. If a word is found (66), the linear prediction coding word parameters are extracted (68) from the vocoder data associated with the word and recognition features are calculated (70) from the extracted linear prediction coding word parameters. Finally, the recognition features are matched with previously stored recognition features of other words (40), thereby recognizing the spoken word.
摘要:
A method and apparatus for a parameter sharing speech recognition system are provided. A model device (410) coupled to receive the output of a signal segmenter hosting a shared hidden Markov model produced by generating a number of phoneme models (600-603), son of which are shared. The phoneme models (600-603) are generated by retaining as a separate phoneme model any triphone model having a number of trained frames available that exceeds a pre-specified threshold. The generated phoneme models are trained, and shared phoneme models states (604-609) are generated that are shared among the phoneme models (600-603). Shared probability distribution functions (610-616) are generated that are shared among the phoneme models (600-609). Shared probability sub-distribution function (617-627) are generated that are shared among the phoneme model probability distribution functions (610-616). The shared phoneme model hierarchy is reevaluated for further sharing in response to the shared probability sub-distribution functions.
摘要:
Acoustic features (109, 111) are extracted from input speech (107) and are compared (113) against pre-stored models (117). The result is used to make a judgement of the user's pronunciation (115).
摘要:
After segmenting a voice signal into individual speech units, said units representing a speech sound block are assembled in a group. These multiple speech units included in a group describe distinctively well a sound block. Different selection criteria to evaluate the usability of individual speech units are provided. One advantage of combining the selection criteria is that different criteria can be taken into account when selecting a representative speech unit. Each selection criterion includes a membership function which indicates the 'usability' of individual speech units to be selected as a representative of the group. Preferably, the speech unit representing a maximum amongst the speech units of the group according to the selection criteria indicated by the membership function is selected as the representative of the corresponding sound block.
摘要:
A system and method for speech-to-speech conversion for providing spoken responses to speech inputs in at least two natural languages wherein speech inputs are recognised and interpreted in said at least two languages. The recognised speech inputs are evaluated to determine the language of the speech inputs, and a dialogue is undertaken with a database containing speech information data, in said at least two natural languages, to obtain data for the formulation of spoken responses to the speech inputs. The speech information data, obtained from the database, is then converted into spoken responses which exhibit the language characteristics of the respective speech inputs.
摘要:
The distance between the first two pitch marks of a voiced portion of speech data to be processed is calculated. The difference between the adjacent inter-pitch-mark distances is calculated. The respective calculation results are stored and managed in a file.
摘要:
A method (500, 600), device (201 and 206) and system (203) provide, in response to text/linguistic information, efficient generation of a parametric representation of speech. A coder parameter generating system provides a principal set and a supplementary set of speech parameters, the principal set of speech parameters being the parametric representation of speech. Then feedback is provided to the coder parameter generating system using the supplementary set of speech parameters to modify the principal set of speech parameters.
摘要:
Prosodic databases hold fundamental frequency templates for use in a speech synthesis system. Prosodic database templates may hold fundamental frequency values for syllables in a given sentence. These fundamental frequency values may be applied in synthesizing a sentence of speech. The templates are indexed by tonal pattern markings. A predicted tonal marking pattern is generated for each sentence of text that is to be synthesized, and this predicted pattern of tonal markings is used to locate a best matching template. The templates are derived by calculating fundamental frequencies on a pursuable basis for sentences that are spoken by a human trainer for a given unlabeled corpus.