摘要:
A method for encoding syllables of a language, particularly the Japanese language, and for facilitating the extraction of sound codes from the input syllables, for voice recognition or voice synthesis includes the step of providing a syllable classifying table, in which each syllable is represented by an upper byte code indicating the consonant part of the syllable and a lower byte code indicating the non-consonant part of the syllable. The consonants constitute a first category of data classified by phonetic features, while the non-consonants constitute a second category of data classified by phonetic features, so that the extraction of consonant or non-consonant sounds can be made by a search in only the first or the second categories. The encoding of diphthongs are made in such a manner that those containing the same vowel have the same remainder corresponding to the code of this vowel, when the codes are divided by the number of vowels contained in the second category, so that the extraction of a vowel from diphthongs can be achieved by a simple mathematical division.
摘要:
A speech recognition method and apparatus in which a speech section is sliced by the unit of a word by spotting and candidate words are selected. Next, in a second stage, matching is conducted by the unit of a phoneme. Consequently, selection of the candidate words and slicing of the speech section can be performed concurrently. Furthermore, narrowing of the candidate words is facilitated. Furthermore, since reference phoneme patterns under a plurality of environments are prepared, recognition of an input speech under a larger number of conditions is possible using a smaller amount of data when compared with the case in which reference word patterns under a plurality of environments are prepared.
摘要:
A voice recognizing method apparatus in which an input voice is recognized by obtaining a similar pattern by comparing the input voice and voice standard patterns. Voice standard patterns are stored into a memory. A voice is inputted. Voice duration lengths and distances are calculated by performing matching processes between the input voice and the standard patterns. The distance is corrected in accordance with the voice duration length so that the voice duration length having the best matching result is used as a reference, or such that the distance is small as the voice duration length is long. A recognition result is determined in accordance with the corrected distance. The matching is executed by a word spotting method. The input voice to be matched and the voice standard patterns are expressed by voice characteristic parameters.
摘要:
Speech recognition is achieved using a normalized cumulative distance. A normalized Dynamic Programming (DP) value is calculated by dividing a cumulative path distance by an optimal integral path length. The path length is calculated iteratively by adding 2 if the warping path is diagonal or by adding 3 if the warping path is horizontal or vertical. Distance may be calculated by measuring a difference between input power and average power. The power difference is weighted by a coefficient (.lambda.) between 0 and 1. A Mahalanobis distance is then weighted by (1-.lambda.) and added to the weighted power difference.
摘要:
The speech processing apparatus and method includes a microphone, an analyzer, a selector, and a memory. The microphone converts input speech into an electrical signal representing speech data. The analyzer converts the speech data into non-linear frequency converted speech data in accordance with a non-linear frequency conversion. The selector selects a coefficient of the non-linear frequency conversion suitable for each of the phonemes or frames of the speech. The memory stores the speech data.
摘要:
An apparatus and method for recognizing speech includes a memory for storing data representing a reference pattern composed of the combination of a word reference pattern and a silence pattern, and a calculator for calculating the differences between data representing the reference pattern and data representing input speech. The use of such a silence pattern in the reference pattern permits a word such as "other" to be distinguished from the word "mother".
摘要:
A method and apparatus for reading out a feature parameter and a driver sound source stored in a VCV (vowel-consonant-vowel) speech segment file, sequentially connecting the readout parameter and the readout sound source information in accordance with a predetermined rule, and supplying connected data to a speech synthesizer, thereby generating a speech output, includes a memory for storing the average power of each vowel, and a power controller for controlling the apparatus to normalize a VCV speech segment so that powers at both ends of each VCV segment coincide with the average power of each vowel.
摘要:
The system implements high-accuracy speech recognition while suppressing the amount of data transfer between the client and server. For this purpose, the client compression-encodes speech parameters by a speech processing unit, and sends the compression-encoded speech parameters to the server. The server receives the compression-encoded speech parameters, a speech processing unit makes speech recognition of the compression-encoded speech parameters, and sends information corresponding to the speech recognition result to the client.
摘要:
A GUI display module displays a contents image based on contents data within a display area, and a display portion switching input module instructs to change the display portion of the contents image within the display area. Based on this instruction input, a display portion switching module changes the display portion of the contents image within the display area. A synthesis text determination module determines data which is to undergo speech synthesis in the contents data on the basis of display portion information which is held by a display portion holding module and indicates the display portion. A speech synthesis module synthesizes speech of the data which is to undergo speech synthesis, and a speech output module outputs the synthesized synthetic speech.
摘要:
A method and apparatus for recognizing speech employing a word dictionary in which the phoneme of words are stored and for recognizing speech based on the recognition of the phonemes. The method and apparatus recognize phonemes and produce data associated with each phoneme according to different speech analyzing and recognizing methods for each kind of phoneme, normalize the produced data, and match the recognized phonemes with words in the word dictionary by means of dynamic programming based on the normalized data.