SPEECH RECOGNITION APPARATUS AND METHOD
    51.
    发明公开
    SPEECH RECOGNITION APPARATUS AND METHOD 审中-公开
    语音识别设备和方法

    公开(公告)号:EP2871640A1

    公开(公告)日:2015-05-13

    申请号:EP13817278.8

    申请日:2013-07-05

    发明人: JUNG, Dukyung

    IPC分类号: G10L15/28 G10L15/24

    摘要: The present specification relates to a speech recognition apparatus and method capable of accurately recognizing the speech of a user in an easy and convenient manner without the user having to operate a speech recognition start button or the like. The speech recognition apparatus according to embodiments of the present specification comprises: a camera for capturing a user image; a microphone; a control unit for detecting a preset user gesture from the user image, and, if a nonlexical word is detected from the speech signal which is input through the microphone from the point in time at which the user gesture was detected, determining the speed signal detected after the detected nonlexical word as an effective speech signal; and a speech recognition unit for recognizing the effective speech signal.

    Procédé de reconnaissance vocale visuelle par suivi des déformations locales d'un ensemble de points d'intérêt de la bouche du locuteur
    52.
    发明公开
    Procédé de reconnaissance vocale visuelle par suivi des déformations locales d'un ensemble de points d'intérêt de la bouche du locuteur 审中-公开
    通过跟踪一组扬声器的嘴的重要参考点的局部变形为视觉语音识别的方法

    公开(公告)号:EP2804175A1

    公开(公告)日:2014-11-19

    申请号:EP14167786.4

    申请日:2014-05-09

    申请人: Parrot

    IPC分类号: G10L15/25 G06K9/00

    摘要: Ce procédé comprend des étapes de : a) pour chaque point d'intérêt de chaque image, calcul d'un descripteur local de gradient et d'un descripteur local de mouvement ; b) constitution de microstructures de n points d'intérêt, définies chacune par un tuple d'ordre d'ordre n ≥1 ; c) détermination, pour chaque tuple d'un vecteur de caractéristiques visuelles structurées ( d 0 ... d 3 ...) à partir des descripteurs locaux ; d) pour chaque tuple, map-page de ce vecteur par un algorithme de classification sélectionnant un codeword unique parmi un ensemble de codewords formant code-book (CB) ; e) génération d'une série temporelle ordonnée des codewords ( a 0 ... a 3 ...) pour les images successives de la séquence video ; et f) mesure, au moyen d'une fonction de type string kernel, de la similarité de la série temporelle de codewords avec une autre série temporelle de code-words issue d'un autre locuteur.

    Procédé de reconnaissance vocale visuelle avec sélection de groupes de points d'intérêts les plus pertinents
    53.
    发明公开
    Procédé de reconnaissance vocale visuelle avec sélection de groupes de points d'intérêts les plus pertinents 审中-公开
    一种最显著的参考点组的选择可视语音识别方法

    公开(公告)号:EP2804129A1

    公开(公告)日:2014-11-19

    申请号:EP14167791.4

    申请日:2014-05-09

    申请人: Parrot

    IPC分类号: G06K9/00 G10L15/25

    摘要: Ce procédé comprend des étapes de : a) constitution d'un ensemble de départ de microstructures de n points d'intérêt, définies chacune par un tuple d'ordre n , avec 1 ≤ n ≤ N ; b) détermination pour chaque tuple de caractéristiques visuelles structurées associées, à partir de descripteurs locaux de gradient et/ou de mouvement des points d'intérêt ; et c) recherche et sélection itérative des tuples les plus discriminants. L'étape c) opère par: c1) application à l'ensemble des tuples d'un algorithme de type apprentissage multi-noyaux MKL ; c2) extraction d'un sous-ensemble de tuples produisant les scores de pertinence les plus élevés ; c3) agrégation à ces tuples d'un tuple additionnel pour donner un nouvel ensemble de tuples d'ordre supérieur ; c4) détermination des caractéristiques visuelles structurées associées à chaque tuple agrégé ; c5) sélection d'un nouveau sous-ensemble de tuples les plus discriminants ; et c6) réitération des étapes c1) à c4) jusqu'à un ordre N maximal.

    Apparatus and method for determining relevance of input speech
    55.
    发明公开
    Apparatus and method for determining relevance of input speech 有权
    Vorrichtung und Verfahren zur Bestimmung der Relevanz der Spracheingabe

    公开(公告)号:EP2509070A1

    公开(公告)日:2012-10-10

    申请号:EP12162896.0

    申请日:2012-04-02

    发明人: Kalinli, Ozlem

    摘要: Audio or visual orientation cues can be used to determine the relevance of input speech. The presence of a user's face may be identified during speech during an interval of time. One or more facial orientation characteristics associated with the user's face during the interval of time may be determined. In some cases, orientation characteristics for input sound can be determined. A relevance of the user's speech during the interval of time may be characterized based on the one or more orientation characteristics.

    摘要翻译: 音频或视觉指导线索可用于确定输入语音的相关性。 用户脸部的存在可以在时间间隔期间在语音期间被识别。 可以确定在时间间隔期间与用户的脸部相关联的一个或多个面部朝向特性。 在某些情况下,可以确定输入声音的取向特性。 可以基于一个或多个取向特征来表征用户在时间间隔期间的语音的相关性。

    PRONUNCIATION DIAGNOSIS DEVICE, PRONUNCIATION DIAGNOSIS METHOD, RECORDING MEDIUM, AND PRONUNCIATION DIAGNOSIS PROGRAM
    56.
    发明公开
    PRONUNCIATION DIAGNOSIS DEVICE, PRONUNCIATION DIAGNOSIS METHOD, RECORDING MEDIUM, AND PRONUNCIATION DIAGNOSIS PROGRAM 审中-公开
    AUSSPRACHEDIAGNOSEEINRICHTUNG,AUSSPRACHEDIAGNOSEVERFAHREN,AUFZEICHNUNGSMEDIUM UND AUSSPRACHEDIAGNOSEPROGRAMM

    公开(公告)号:EP1947643A1

    公开(公告)日:2008-07-23

    申请号:EP06810834.9

    申请日:2006-09-29

    摘要: A pronunciation diagnosis device according to the present invention diagnoses the pronunciation of a speaker using articulatory attribute data including articulatory attribute values corresponding to an articulatory attribute of a desirable pronunciation for each phoneme in each audio language system, the articulatory attribute including any one condition of the tongue in the oral cavity, the lips, the vocal cord, the uvula, the nasal cavity, the teeth, and the jaws, or a combination including at least one of the conditions of the articulatory organs; the way of applying force in the conditions of articulatory organs; and a combination of breathing conditions; extracting an acoustic feature from an audio signal generated by a speaker, the acoustic feature being a frequency feature quantity, a sound volume, and a duration time, a rate of change or change pattern thereof, and at least one combination thereof; estimating an attribute value associated with the articulatory attribute on the basis of the extracted acoustic feature; and comparing the estimated attribute value with the desirable articulatory attribute data.

    摘要翻译: 根据本发明的发音诊断装置使用包括与每个音频语言系统中的每个音素的期望发音的发音性质相对应的发音性属性值的发音性属性诊断扬声器的发音,所述发音属性包括 口腔中的舌头,嘴唇,声带,尿道,鼻腔,牙齿和下颌,或包括关节器官的至少一种条件的组合; 在发音器官条件下施加武力的方式; 和呼吸条件的组合; 从由扬声器产生的音频信号提取声学特征,所述声学特征是频率特征量,音量和持续时间,变化率或变化模式及其至少一个组合; 基于所提取的声学特征来估计与所述发音属性相关联的属性值; 以及将估计的属性值与期望的发音属性数据进行比较。

    Change information recognition apparatus and change information recognition method
    57.
    发明公开
    Change information recognition apparatus and change information recognition method 有权
    VorrichtungfürErkennung vonÄnderungsinformation

    公开(公告)号:EP1881484A1

    公开(公告)日:2008-01-23

    申请号:EP07021669.2

    申请日:2004-04-09

    发明人: Funayama, Ryuji

    CPC分类号: G10L15/25 G06K9/00335

    摘要: A change information recognition apparatus comprises a series information storing device for storing series information about a recognition object (a motion picture taken by an image taking device, or the like), and a basic change information storing device for preliminarily storing basic change information corresponding to changes of the series information. The series information storing device feeds the series information to a change state comparing device, and the basic change information storing device feeds the basic change information to the change state comparing device. The change state comparing device compares the change information with the basic change information thus fed, to recognize a change state of the recognition object.

    摘要翻译: 变更信息识别装置包括:串行信息存储装置,用于存储关于识别对象的系列信息(由摄像装置拍摄的动态图像等);以及基本变更信息存储装置,用于预先存储对应于 系列信息的变化。 串联信息存储装置将串联信息馈送到改变状态比较装置,基本改变信息存储装置将基本改变信息提供给改变状态比较装置。 改变状态比较装置将变化信息与这样馈送的基本变化信息进行比较,以识别识别对象的变化状态。

    TECHNIQUES FOR SEPARATING AND EVALUATING AUDIO AND VIDEO SOURCE DATA
    58.
    发明公开
    TECHNIQUES FOR SEPARATING AND EVALUATING AUDIO AND VIDEO SOURCE DATA 审中-公开
    技术分离和音视频来源数据速率

    公开(公告)号:EP1730667A1

    公开(公告)日:2006-12-13

    申请号:EP05731257.1

    申请日:2005-03-25

    申请人: Intel Corporation

    IPC分类号: G06K9/00 G10L15/24

    CPC分类号: G10L15/25

    摘要: Methods, systems, and apparatus are provided to separate and evaluate audio and video. Audio and video are captured; the video is evaluated to detect one or more speakers speaking. Visual features are associated with the speakers speaking. The audio and video are separated and corresponding portions of the audio are mapped to the visual features for purposes of isolating audio associated with each speaker and for purposes of filtering out noise associated with the audio.

    RECOGNITION APPARATUS, RECOGNITION METHOD, LEARNING APPARATUS AND LEARNING METHOD
    59.
    发明公开
    RECOGNITION APPARATUS, RECOGNITION METHOD, LEARNING APPARATUS AND LEARNING METHOD 失效
    VORRICHTUNG UND VERFAHREN ZUR MUSTERERKENNUNG UND ZUR ADAPTION

    公开(公告)号:EP0896319A1

    公开(公告)日:1999-02-10

    申请号:EP97949208.9

    申请日:1997-12-22

    申请人: SONY CORPORATION

    IPC分类号: G10L3/00

    摘要: Different types of data including voice data of a user, image data produced by picturing the mouth of the user, and ambient noise data are provided through an input unit 10. Those data are analyzed by preprocessors 20 to 23 respectively to determine characteristic parameters. In a classification data constructing unit 24, classification data is constructed from the characteristic parameters and transferred to a classification unit 25 for classification. Meanwhile, an integrated parameter constructing unit 26 constructs integrated parameters from the characteristic parameters provided by the preprocessors 20 to 23. An adaptivity determining unit 27 selects a table corresponding to the class determined by the classification unit 25. From the standard parameters saved in the table and the integrated parameter from the integrated parameter constructing unit 26, the voice emitted by a user is recognized. Accordingly, the accuracy of the voice recognition will be increased.

    摘要翻译: 通过输入单元10提供包括用户的语音数据的不同类型的数据,通过绘制用户嘴部产生的图像数据和环境噪声数据。这些数据分别由预处理器20至23分析以确定特征参数。 在分类数据构成单元24中,根据特征参数构成分类数据,并将其转移到分类单元25进行分类。 同时,综合参数构成单元26根据由预处理器20至23提供的特征参数来构建集成参数。自适应性确定单元27选择与由分类单元25确定的类别相对应的表格。从保存在表格中的标准参数 和来自综合参数构成单元26的积分参数,识别由用户发出的声音。 因此,将增加语音识别的准确性。

    Voice operated game apparatus
    60.
    发明公开
    Voice operated game apparatus 失效
    Sprachgesteuerte Spielvorrichtung。

    公开(公告)号:EP0683481A2

    公开(公告)日:1995-11-22

    申请号:EP95107008.5

    申请日:1995-05-09

    IPC分类号: G10L3/00 G10L9/20

    CPC分类号: G10L15/25 G10L25/87

    摘要: A game apparatus of the invention includes: a voice input section for inputting at least one voice set including voice uttered by an operator, for converting the voice set into a first electric signal, and for outputting the first electric signal; a voice recognition section for recognizing the voice set on the basis of the first electric signal output from the voice input means; an image input section for optically detecting a movement of the lips of the operator, for converting the detected movement of lips into a second electric signal, and for outputting the second electric signal; a speech period detection section for receiving the second electric signal, and for obtaining a period in which the voice is uttered by the operator on the basis of the received second electric signal; an overall judgment section for extracting the voice uttered by the operator from the input voice set, on the basis of the voice set recognized by the voice recognition means and the period obtained by the speech period detection means; and a control means for controlling an object on the basis of the voice extracted by the overall judgment means.

    摘要翻译: 本发明的游戏装置包括:语音输入部分,用于输入包括由操作者发出的语音的至少一个语音集合,用于将语音集合转换为第一电信号并输出​​第一电信号; 语音识别部分,用于基于从语音输入装置输出的第一电信号识别语音集; 图像输入部,用于光学地检测操作者的嘴唇的运动,用于将检测到的嘴唇的移动转换成第二电信号,并输出第二电信号; 语音周期检测部分,用于接收第二电信号,并且用于基于所接收的第二电信号获得操作者发出语音的周期; 根据由语音识别装置识别的语音集和由语音周期检测装置获得的周期,从输入语音集中提取操作者发出的语音的总体判断部分; 以及控制装置,用于根据由整体判断装置提取的声音来控制对象。