Word-Level Correction of Speech Input
    2.
    发明申请
    Word-Level Correction of Speech Input 有权
    语音输入字词校正

    公开(公告)号:US20120022868A1

    公开(公告)日:2012-01-26

    申请号:US13249539

    申请日:2011-09-30

    IPC分类号: G10L15/26

    摘要: The subject matter of this specification can be implemented in, among other things, a computer-implemented method for correcting words in transcribed text including receiving speech audio data from a microphone. The method further includes sending the speech audio data to a transcription system. The method further includes receiving a word lattice transcribed from the speech audio data by the transcription system. The method further includes presenting one or more transcribed words from the word lattice. The method further includes receiving a user selection of at least one of the presented transcribed words. The method further includes presenting one or more alternate words from the word lattice for the selected transcribed word. The method further includes receiving a user selection of at least one of the alternate words. The method further includes replacing the selected transcribed word in the presented transcribed words with the selected alternate word.

    摘要翻译: 除了别的以外,本说明书的主题可以实现用于校正转录文本中的单词的计算机实现的方法,包括从麦克风接收语音音频数据。 该方法还包括将语音音频数据发送到转录系统。 该方法还包括从转录系统接收从语音音频数据转录的单词格。 该方法还包括从单词格中呈现一个或多个转录词。 所述方法还包括接收所呈现的转录词中的至少一个的用户选择。 该方法还包括向所选择的转录词提供来自词格的一个或多个替代词。 该方法还包括接收至少一个替代单词的用户选择。 所述方法还包括用所选择的替代词替换所呈现的转录词中的所选转录词。

    Speech and Noise Models for Speech Recognition

    公开(公告)号:US20120022860A1

    公开(公告)日:2012-01-26

    申请号:US13250777

    申请日:2011-09-30

    IPC分类号: G10L21/02

    CPC分类号: G10L15/20 G10L21/0208

    摘要: An audio signal generated by a device based on audio input from a user may be received. The audio signal may include at least a user audio portion that corresponds to one or more user utterances recorded by the device. A user speech model associated with the user may be accessed and a determination may be made background audio in the audio signal is below a defined threshold. In response to determining that the background audio in the audio signal is below the defined threshold, the accessed user speech model may be adapted based on the audio signal to generate an adapted user speech model that models speech characteristics of the user. Noise compensation may be performed on the received audio signal using the adapted user speech model to generate a filtered audio signal with reduced background audio compared to the received audio signal.

    SPEECH DETECTION AND ENHANCEMENT USING AUDIO/VIDEO FUSION
    4.
    发明申请
    SPEECH DETECTION AND ENHANCEMENT USING AUDIO/VIDEO FUSION 有权
    使用音频/视频融合的语音检测和增强

    公开(公告)号:US20080059174A1

    公开(公告)日:2008-03-06

    申请号:US11852961

    申请日:2007-09-10

    IPC分类号: G10L15/00

    摘要: A system and method facilitating speech detection and/or enhancement utilizing audio/video fusion is provided. The present invention fuses audio and video in a probabilistic generative model that implements cross-model, self-supervised learning, enabling rapid adaptation to audio visual data. The system can learn to detect and enhance speech in noise given only a short (e.g., 30 second) sequence of audio-visual data. In addition, it automatically learns to track the lips as they move around in the video.

    摘要翻译: 提供了利用音频/视频融合的促进语音检测和/或增强的系统和方法。 本发明将音频和视频融合在实现跨模型,自我监督学习的概率生成模型中,使得能够快速适应视听数据。 该系统可以学习仅在短(例如,30秒)的视听数据序列中检测和增强噪声中的语音。 此外,它会自动学习在视频中移动时跟踪嘴唇。

    Speech recognition using repeated utterances
    5.
    发明授权
    Speech recognition using repeated utterances 有权
    使用重复发音的语音识别

    公开(公告)号:US09123339B1

    公开(公告)日:2015-09-01

    申请号:US12953344

    申请日:2010-11-23

    IPC分类号: G10L15/22

    摘要: Subject matter described in this specification can be embodied in methods, computer program products and systems relating to speech-to-text conversion. A first spoken input is received from a user of an electronic device (an “original utterance”). Based on the original utterance, a first set of character string candidates are determined that each represent the original utterance converted to textual characters and a selection of one or more of the character string candidates are provided in a format for display to the user. A second spoken input is received from the user and a determination is made that the second spoken input is a repeat utterance of the original utterance. Based on this determination and using the original utterance and the repeat utterance, a second set of character string candidates is determined.

    摘要翻译: 本说明书中描述的主题可以体现在与语音到文本转换相关的方法,计算机程序产品和系统中。 从电子设备的用户接收到第一个口头输入(“原始话语”)。 基于原始发音,确定第一组字符串候选,其中每一个表示转换为文本字符的原始发音,并且以用于向用户显示的格式提供一个或多个字符串候选的选择。 从用户接收到第二个口头输入,并且确定第二个口头输入是原始话语的重复发音。 基于该确定并使用原始发音和重复发音,确定第二组字符串候选。

    Speech and noise models for speech recognition

    公开(公告)号:US08249868B2

    公开(公告)日:2012-08-21

    申请号:US13250777

    申请日:2011-09-30

    IPC分类号: G10L15/20

    CPC分类号: G10L15/20 G10L21/0208

    摘要: An audio signal generated by a device based on audio input from a user may be received. The audio signal may include at least a user audio portion that corresponds to one or more user utterances recorded by the device. A user speech model associated with the user may be accessed and a determination may be made background audio in the audio signal is below a defined threshold. In response to determining that the background audio in the audio signal is below the defined threshold, the accessed user speech model may be adapted based on the audio signal to generate an adapted user speech model that models speech characteristics of the user. Noise compensation may be performed on the received audio signal using the adapted user speech model to generate a filtered audio signal with reduced background audio compared to the received audio signal.

    Speech and noise models for speech recognition
    7.
    发明授权
    Speech and noise models for speech recognition 有权
    用于语音识别的语音和噪声模型

    公开(公告)号:US08234111B2

    公开(公告)日:2012-07-31

    申请号:US12814665

    申请日:2010-06-14

    IPC分类号: G10L15/20

    CPC分类号: G10L15/20 G10L21/0208

    摘要: An audio signal generated by a device based on audio input from a user may be received. The audio signal may include at least a user audio portion that corresponds to one or more user utterances recorded by the device. A user speech model associated with the user may be accessed and a determination may be made background audio in the audio signal is below a defined threshold. In response to determining that the background audio in the audio signal is below the defined threshold, the accessed user speech model may be adapted based on the audio signal to generate an adapted user speech model that models speech characteristics of the user. Noise compensation may be performed on the received audio signal using the adapted user speech model to generate a filtered audio signal with reduced background audio compared to the received audio signal.

    摘要翻译: 可以接收由基于来自用户的音频输入的设备生成的音频信号。 音频信号可以包括至少一个对应于由该设备记录的一个或多个用户话语的用户音频部分。 可以访问与用户相关联的用户语音模型,并且可以确定音频信号中的背景音频低于定义的阈值。 响应于确定音频信号中的背景音频低于定义的阈值,可以基于音频信号来调整所访问的用户语音模型,以生成对用户的语音特征进行建模的适配的用户语音模型。 可以使用适配的用户语音模型对所接收的音频信号执行噪声补偿,以生成与接收的音频信号相比具有降低的背景音频的滤波音频信号。

    GEOTAGGED ENVIRONMENTAL AUDIO FOR ENHANCED SPEECH RECOGNITION ACCURACY
    8.
    发明申请
    GEOTAGGED ENVIRONMENTAL AUDIO FOR ENHANCED SPEECH RECOGNITION ACCURACY 有权
    GEOTAGGED环境音频用于增强语音识别精度

    公开(公告)号:US20120022870A1

    公开(公告)日:2012-01-26

    申请号:US13250843

    申请日:2011-09-30

    IPC分类号: H04W64/00 G10L15/00

    CPC分类号: G10L21/0208 G10L15/20

    摘要: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for enhancing speech recognition accuracy. In one aspect, a method includes receiving geotagged audio signals that correspond to environmental audio recorded by multiple mobile devices in multiple geographic locations, receiving an audio signal that corresponds to an utterance recorded by a particular mobile device, determining a particular geographic location associated with the particular mobile device, generating a noise model for the particular geographic location using a subset of the geotagged audio signals, where noise compensation is performed on the audio signal that corresponds to the utterance using the noise model that has been generated for the particular geographic location.

    摘要翻译: 方法,系统和装置,包括在计算机存储介质上编码的计算机程序,用于增强语音识别精度。 一方面,一种方法包括接收对应于多个地理位置中的多个移动设备记录的环境音频的地理标记音频信号,接收对应于由特定移动设备记录的话语的音频信号,确定与该特定移动设备相关联的特定地理位置 特定的移动设备,使用所述地理标记的音频信号的子集来生成针对所述特定地理位置的噪声模型,其中使用对于所述特定地理位置生成的所述噪声模型对与所述话语相对应的所述音频信号执行噪声补偿。

    ACOUSTIC MODEL ADAPTATION USING GEOGRAPHIC INFORMATION
    9.
    发明申请
    ACOUSTIC MODEL ADAPTATION USING GEOGRAPHIC INFORMATION 有权
    使用地理信息的声学模型适应

    公开(公告)号:US20110295590A1

    公开(公告)日:2011-12-01

    申请号:US12787568

    申请日:2010-05-26

    IPC分类号: G06F17/20

    摘要: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for enhancing speech recognition accuracy. In one aspect, a method includes receiving an audio signal that corresponds to an utterance recorded by a mobile device, determining a geographic location associated with the mobile device, adapting one or more acoustic models for the geographic location, and performing speech recognition on the audio signal using the one or more acoustic models model that are adapted for the geographic location.

    摘要翻译: 方法,系统和装置,包括在计算机存储介质上编码的计算机程序,用于增强语音识别精度。 在一个方面,一种方法包括接收对应于由移动设备记录的话语的音频信号,确定与移动设备相关联的地理位置,调整用于地理位置的一个或多个声学模型,以及对该音频执行语音识别 使用适合于地理位置的一个或多个声学模型模型的信号。

    Method and apparatus for scene learning and three-dimensional tracking using stereo video cameras
    10.
    发明授权
    Method and apparatus for scene learning and three-dimensional tracking using stereo video cameras 有权
    使用立体摄像机进行场景学习和三维跟踪的方法和装置

    公开(公告)号:US07486815B2

    公开(公告)日:2009-02-03

    申请号:US10783709

    申请日:2004-02-20

    IPC分类号: G06K9/00 H04N13/02 H04N5/225

    CPC分类号: G06K9/32 G06T7/285

    摘要: A method and apparatus are provided for learning a model for the appearance of an object while tracking the position of the object in three dimensions. Under embodiments of the present invention, this is achieved by combining a particle filtering technique for tracking the object's position with an expectation-maximization technique for learning the appearance of the object. Two stereo cameras are used to generate data for the learning and tracking.

    摘要翻译: 提供了一种方法和装置,用于在跟踪三维物体的位置的同时学习物体外观的模型。 在本发明的实施例中,这是通过将用于跟踪对象的位置的粒子滤波技术与用于学习对象的外观的期望最大化技术组合来实现的。 两个立体相机用于生成用于学习和跟踪的数据。