On-the-fly speech learning and computer model generation using audio-visual synchronization
    21.
    发明授权
    On-the-fly speech learning and computer model generation using audio-visual synchronization 有权
    使用视听同步的即时语音学习和计算机模型生成

    公开(公告)号:US09548048B1

    公开(公告)日:2017-01-17

    申请号:US14745248

    申请日:2015-06-19

    Abstract: A speech recognition computer system uses video input as well as audio input of known speech when the speech recognition computer system is being trained to recognize unknown speech. The video of the speaker can be captured using multiple cameras, from multiple angles. The audio can be captured using multiple microphones. The video and audio can be sampled so that timing of events in the video and audio can be determined from the content independent of an audio or video capture device's clock. Video features, such as a speaker's moving body parts, can be extracted from the video and random sampled, to be used in a speech modeling process. Audio is modeled at the phoneme level, which provides word mapping with minor additional effort. The trained speech recognition computer system can then be used to recognize speech text from video/audio of unknown speech.

    Abstract translation: 当语音识别计算机系统被训练以识别未知语音时,语音识别计算机系统使用视频输入以及已知语音的音频输入。 扬声器的视频可以从多个角度使用多个摄像头捕获。 可以使用多个麦克风捕获音频。 可以对视频和音频进行采样,从而可以从独立于音频或视频捕获设备的时钟的内容确定视频和音频中事件的定时。 可以从视频中提取诸如扬声器的移动体部分的视频特征,并随机采样,以用于语音建模过程。 音频是在音素级建模的,它提供了单词映射和较小的额外努力。 训练有素的语音识别计算机系统然后可用于识别来自未知语音的视频/音频的语音文本。

Patent Agency Ranking