摘要:
There is provided a voice processing device. The device includes: score calculation unit configured to calculate a score indicating compatibility of a voice signal input on the basis of an utterance of a user with each of plural pieces of intention information indicating each of a plurality of intentions; intention selection unit configured to select the intention information indicating the intention of the utterance of the user among the plural pieces of intention information on the basis of the score calculated by the score calculation unit; and intention reliability calculation unit configured to calculate the reliability with respect to the intention information selected by the intention selection unit on the basis of the score calculated by the score calculation unit.
摘要:
A speech recognition device includes one intention extracting language model and more in which an intention of a focused specific task is inherent, an absorbing language model in which any intention of the task is not inherent, a language score calculating section that calculates a language score indicating a linguistic similarity between each of the intention extracting language model and the absorbing language model, and the content of an utterance, and a decoder that estimates an intention in the content of an utterance based on a language score of each of the language models calculated by the language score calculating section.
摘要:
A mapping determination method for obtaining mapping F from an N-dimensional metric vector space .OMEGA..sub.N to an M-dimensional metric vector space .OMEGA..sub.M has the following steps to get the optimal mapping quickly and positively. In the first step, complete, periodic, L.sub.m basic functions g.sub.m (X) according to the distribution of samples classified into Q categories on the N-dimensional metric vector space .OMEGA..sub.N are set. In the second step, a function f.sub.m (X) indicating the m-th component of the mapping F is expressed with the linear sum of the functions g.sub.m (X) and L.sub.m coefficients c.sub.m. The third step provides Q teacher vectors T.sub.q =(t.sub.q.1, t.sub.q.2, t.sub.q.3, . . . , t.sub.q.M) (where q=1, 2, . . . , Q) for the categories on the M-dimensional metric vector space .OMEGA..sub.M, calculates the specified estimation function J, and obtains the coefficients c.sub.m which minimize the estimation function J. In the fourth step, the coefficients c.sub.m obtained in the third step are stored in memory.
摘要:
A navigation apparatus and navigation method for an automobile in which a map is visually displayed and a desired destination can be set by speaking the name of such destination. A voice recognition section recognizes the destination and marks it on the map that is being displayed and the best route to the displayed destination is then shown on the map to be followed by the driver of the automobile.
摘要:
A preliminary word-selecting section selects one or more words following words which have been obtained in a word string serving as a candidate for a result of speech recognition; and a matching section calculates acoustic or linguistic scores for the selected words, and forms a word string serving as a candidate for a result of speech recognition according to the scores. A control section generates word-connection relationships between words in the word string serving as a candidate for a result of speech recognition, sends them to a word-connection-information storage section, and stores them in it. A re-evaluation section corrects the word-connection relationships stored in the word-connection-information storage section 16, and the control section determines a word string serving as the result of speech recognition according to the corrected word-connection relationships.
摘要:
An apparatus, method and program for performing a speech recognition process utilizing contextual information that comprises an estimation of the intention of an utterance of a user. The recognition process includes calculating a pre-score based on observed contextual information according intention models which correspond to a plurality of types of intention information and combining the pre-scoring results with acoustic and linguistic scores to obtain an improved recognition or comprehension of the intent of a user utterance.
摘要:
In order to prevent degradation of speech recognition accuracy due to an unknown word, a dictionary database has stored therein a word dictionary in which are stored, in addition to words for the objects of speech recognition, suffixes, which are sound elements and a sound element sequence, which form the unknown word, for classifying the unknown word by the part of speech thereof. Based on such a word dictionary, a matching section connects the acoustic models of an sound model database, and calculates the score using the series of features output by a feature extraction section on the basis of the connected acoustic model. Then, the matching section selects a series of the words, which represents the speech recognition result, on the basis of the score.
摘要:
An extended-word selecting section calculates a score for a phoneme string formed of one more phonemes, corresponding to a user's speech, and searches a large-vocabulary-dictionary for a word having one or more phonemes equal to or similar to those of a phoneme string having a score equal to or higher than a predetermined value. A matching section calculates scores for the word searched for by the extended-word selecting section in addition to a word preliminary word-selecting section. A control section determines a word string as the result of recognition of the speech uttered by the user.
摘要:
A book database stores at least phonetic signal information including phoneme information and rhythm information as document data, a central system transmits phonetic signal information stored on the book database to a terminal and the terminal receives the phonetic signal information is then carried out at the terminal and the document is then recited via synthesized sounds.
摘要:
Voice processing for recognizing a predetermined voice such as a place name is performed by a voice processing section 14 from an audio signal inputted from a microphone 11 on the basis of an operation of a talk switch 18. When a map display is based on the recognized place name is performed, an incorrect reading and a place name commonly mistaken can be also recognized. Accordingly, a high grade operation of a navigation apparatus can be simply performed without obstructing an operator driving while a car.