摘要:
The present specification relates to a speech recognition apparatus and method capable of accurately recognizing the speech of a user in an easy and convenient manner without the user having to operate a speech recognition start button or the like. The speech recognition apparatus according to embodiments of the present specification comprises: a camera for capturing a user image; a microphone; a control unit for detecting a preset user gesture from the user image, and, if a nonlexical word is detected from the speech signal which is input through the microphone from the point in time at which the user gesture was detected, determining the speed signal detected after the detected nonlexical word as an effective speech signal; and a speech recognition unit for recognizing the effective speech signal.
摘要:
Ce procédé comprend des étapes de : a) pour chaque point d'intérêt de chaque image, calcul d'un descripteur local de gradient et d'un descripteur local de mouvement ; b) constitution de microstructures de n points d'intérêt, définies chacune par un tuple d'ordre d'ordre n ≥1 ; c) détermination, pour chaque tuple d'un vecteur de caractéristiques visuelles structurées ( d 0 ... d 3 ...) à partir des descripteurs locaux ; d) pour chaque tuple, map-page de ce vecteur par un algorithme de classification sélectionnant un codeword unique parmi un ensemble de codewords formant code-book (CB) ; e) génération d'une série temporelle ordonnée des codewords ( a 0 ... a 3 ...) pour les images successives de la séquence video ; et f) mesure, au moyen d'une fonction de type string kernel, de la similarité de la série temporelle de codewords avec une autre série temporelle de code-words issue d'un autre locuteur.
摘要:
Ce procédé comprend des étapes de : a) constitution d'un ensemble de départ de microstructures de n points d'intérêt, définies chacune par un tuple d'ordre n , avec 1 ≤ n ≤ N ; b) détermination pour chaque tuple de caractéristiques visuelles structurées associées, à partir de descripteurs locaux de gradient et/ou de mouvement des points d'intérêt ; et c) recherche et sélection itérative des tuples les plus discriminants. L'étape c) opère par: c1) application à l'ensemble des tuples d'un algorithme de type apprentissage multi-noyaux MKL ; c2) extraction d'un sous-ensemble de tuples produisant les scores de pertinence les plus élevés ; c3) agrégation à ces tuples d'un tuple additionnel pour donner un nouvel ensemble de tuples d'ordre supérieur ; c4) détermination des caractéristiques visuelles structurées associées à chaque tuple agrégé ; c5) sélection d'un nouveau sous-ensemble de tuples les plus discriminants ; et c6) réitération des étapes c1) à c4) jusqu'à un ordre N maximal.
摘要:
A system and method is provided that authenticates a user using hybrid biometrics information, such as a user's image information, a user's voice information, etc. The user authentication method includes: acquiring a number of biometrics information; generating a number of authentication information corresponding to the acquired biometrics information; and performing an integral user authentication based on the by generated authentication information.
摘要:
Audio or visual orientation cues can be used to determine the relevance of input speech. The presence of a user's face may be identified during speech during an interval of time. One or more facial orientation characteristics associated with the user's face during the interval of time may be determined. In some cases, orientation characteristics for input sound can be determined. A relevance of the user's speech during the interval of time may be characterized based on the one or more orientation characteristics.
摘要:
A pronunciation diagnosis device according to the present invention diagnoses the pronunciation of a speaker using articulatory attribute data including articulatory attribute values corresponding to an articulatory attribute of a desirable pronunciation for each phoneme in each audio language system, the articulatory attribute including any one condition of the tongue in the oral cavity, the lips, the vocal cord, the uvula, the nasal cavity, the teeth, and the jaws, or a combination including at least one of the conditions of the articulatory organs; the way of applying force in the conditions of articulatory organs; and a combination of breathing conditions; extracting an acoustic feature from an audio signal generated by a speaker, the acoustic feature being a frequency feature quantity, a sound volume, and a duration time, a rate of change or change pattern thereof, and at least one combination thereof; estimating an attribute value associated with the articulatory attribute on the basis of the extracted acoustic feature; and comparing the estimated attribute value with the desirable articulatory attribute data.
摘要:
A change information recognition apparatus comprises a series information storing device for storing series information about a recognition object (a motion picture taken by an image taking device, or the like), and a basic change information storing device for preliminarily storing basic change information corresponding to changes of the series information. The series information storing device feeds the series information to a change state comparing device, and the basic change information storing device feeds the basic change information to the change state comparing device. The change state comparing device compares the change information with the basic change information thus fed, to recognize a change state of the recognition object.
摘要:
Methods, systems, and apparatus are provided to separate and evaluate audio and video. Audio and video are captured; the video is evaluated to detect one or more speakers speaking. Visual features are associated with the speakers speaking. The audio and video are separated and corresponding portions of the audio are mapped to the visual features for purposes of isolating audio associated with each speaker and for purposes of filtering out noise associated with the audio.
摘要:
Different types of data including voice data of a user, image data produced by picturing the mouth of the user, and ambient noise data are provided through an input unit 10. Those data are analyzed by preprocessors 20 to 23 respectively to determine characteristic parameters. In a classification data constructing unit 24, classification data is constructed from the characteristic parameters and transferred to a classification unit 25 for classification. Meanwhile, an integrated parameter constructing unit 26 constructs integrated parameters from the characteristic parameters provided by the preprocessors 20 to 23. An adaptivity determining unit 27 selects a table corresponding to the class determined by the classification unit 25. From the standard parameters saved in the table and the integrated parameter from the integrated parameter constructing unit 26, the voice emitted by a user is recognized. Accordingly, the accuracy of the voice recognition will be increased.
摘要:
A game apparatus of the invention includes: a voice input section for inputting at least one voice set including voice uttered by an operator, for converting the voice set into a first electric signal, and for outputting the first electric signal; a voice recognition section for recognizing the voice set on the basis of the first electric signal output from the voice input means; an image input section for optically detecting a movement of the lips of the operator, for converting the detected movement of lips into a second electric signal, and for outputting the second electric signal; a speech period detection section for receiving the second electric signal, and for obtaining a period in which the voice is uttered by the operator on the basis of the received second electric signal; an overall judgment section for extracting the voice uttered by the operator from the input voice set, on the basis of the voice set recognized by the voice recognition means and the period obtained by the speech period detection means; and a control means for controlling an object on the basis of the voice extracted by the overall judgment means.