Speech detection
    1.
    发明授权
    Speech detection 有权
    语音检测

    公开(公告)号:US08131543B1

    公开(公告)日:2012-03-06

    申请号:US12102611

    申请日:2008-04-14

    IPC分类号: G10L15/00

    CPC分类号: G10L25/78

    摘要: The subject matter of this specification can be embodied in, among other things, a method that includes receiving an audio signal, determining an energy-independent component of a portion of the audio signal associated with a spectral shape of the portion, and determining an energy-dependent component of the portion associated with a gain level of the portion. The method also comprises comparing the energy-independent and energy-dependent components to a speech model, comparing the energy-independent and energy-dependent components to a noise model, and outputting an indication whether the portion of the audio signal more closely corresponds to the speech model or to the noise model based on the comparisons.

    摘要翻译: 本说明书的主题可以包括接收音频信号的方法,确定与该部分的频谱形状相关联的音频信号的一部分的与能量无关的分量,以及确定能量 与该部分的增益水平相关联的部分的相关分量。 该方法还包括将能量无关和能量相关分量与语音模型进行比较,将能量无关和能量相关分量与噪声模型进行比较,并输出音频信号的部分是否更接近于 语音模型或基于比较的噪声模型。

    Word-Level Correction of Speech Input
    3.
    发明申请
    Word-Level Correction of Speech Input 有权
    语音输入字词校正

    公开(公告)号:US20120022868A1

    公开(公告)日:2012-01-26

    申请号:US13249539

    申请日:2011-09-30

    IPC分类号: G10L15/26

    摘要: The subject matter of this specification can be implemented in, among other things, a computer-implemented method for correcting words in transcribed text including receiving speech audio data from a microphone. The method further includes sending the speech audio data to a transcription system. The method further includes receiving a word lattice transcribed from the speech audio data by the transcription system. The method further includes presenting one or more transcribed words from the word lattice. The method further includes receiving a user selection of at least one of the presented transcribed words. The method further includes presenting one or more alternate words from the word lattice for the selected transcribed word. The method further includes receiving a user selection of at least one of the alternate words. The method further includes replacing the selected transcribed word in the presented transcribed words with the selected alternate word.

    摘要翻译: 除了别的以外,本说明书的主题可以实现用于校正转录文本中的单词的计算机实现的方法,包括从麦克风接收语音音频数据。 该方法还包括将语音音频数据发送到转录系统。 该方法还包括从转录系统接收从语音音频数据转录的单词格。 该方法还包括从单词格中呈现一个或多个转录词。 所述方法还包括接收所呈现的转录词中的至少一个的用户选择。 该方法还包括向所选择的转录词提供来自词格的一个或多个替代词。 该方法还包括接收至少一个替代单词的用户选择。 所述方法还包括用所选择的替代词替换所呈现的转录词中的所选转录词。

    Speech and Noise Models for Speech Recognition

    公开(公告)号:US20120022860A1

    公开(公告)日:2012-01-26

    申请号:US13250777

    申请日:2011-09-30

    IPC分类号: G10L21/02

    CPC分类号: G10L15/20 G10L21/0208

    摘要: An audio signal generated by a device based on audio input from a user may be received. The audio signal may include at least a user audio portion that corresponds to one or more user utterances recorded by the device. A user speech model associated with the user may be accessed and a determination may be made background audio in the audio signal is below a defined threshold. In response to determining that the background audio in the audio signal is below the defined threshold, the accessed user speech model may be adapted based on the audio signal to generate an adapted user speech model that models speech characteristics of the user. Noise compensation may be performed on the received audio signal using the adapted user speech model to generate a filtered audio signal with reduced background audio compared to the received audio signal.

    SPEECH DETECTION AND ENHANCEMENT USING AUDIO/VIDEO FUSION
    5.
    发明申请
    SPEECH DETECTION AND ENHANCEMENT USING AUDIO/VIDEO FUSION 有权
    使用音频/视频融合的语音检测和增强

    公开(公告)号:US20080059174A1

    公开(公告)日:2008-03-06

    申请号:US11852961

    申请日:2007-09-10

    IPC分类号: G10L15/00

    摘要: A system and method facilitating speech detection and/or enhancement utilizing audio/video fusion is provided. The present invention fuses audio and video in a probabilistic generative model that implements cross-model, self-supervised learning, enabling rapid adaptation to audio visual data. The system can learn to detect and enhance speech in noise given only a short (e.g., 30 second) sequence of audio-visual data. In addition, it automatically learns to track the lips as they move around in the video.

    摘要翻译: 提供了利用音频/视频融合的促进语音检测和/或增强的系统和方法。 本发明将音频和视频融合在实现跨模型,自我监督学习的概率生成模型中,使得能够快速适应视听数据。 该系统可以学习仅在短(例如,30秒)的视听数据序列中检测和增强噪声中的语音。 此外,它会自动学习在视频中移动时跟踪嘴唇。

    Speech recognition using repeated utterances
    6.
    发明授权
    Speech recognition using repeated utterances 有权
    使用重复发音的语音识别

    公开(公告)号:US09123339B1

    公开(公告)日:2015-09-01

    申请号:US12953344

    申请日:2010-11-23

    IPC分类号: G10L15/22

    摘要: Subject matter described in this specification can be embodied in methods, computer program products and systems relating to speech-to-text conversion. A first spoken input is received from a user of an electronic device (an “original utterance”). Based on the original utterance, a first set of character string candidates are determined that each represent the original utterance converted to textual characters and a selection of one or more of the character string candidates are provided in a format for display to the user. A second spoken input is received from the user and a determination is made that the second spoken input is a repeat utterance of the original utterance. Based on this determination and using the original utterance and the repeat utterance, a second set of character string candidates is determined.

    摘要翻译: 本说明书中描述的主题可以体现在与语音到文本转换相关的方法,计算机程序产品和系统中。 从电子设备的用户接收到第一个口头输入(“原始话语”)。 基于原始发音,确定第一组字符串候选,其中每一个表示转换为文本字符的原始发音,并且以用于向用户显示的格式提供一个或多个字符串候选的选择。 从用户接收到第二个口头输入,并且确定第二个口头输入是原始话语的重复发音。 基于该确定并使用原始发音和重复发音,确定第二组字符串候选。

    Speech and noise models for speech recognition

    公开(公告)号:US08249868B2

    公开(公告)日:2012-08-21

    申请号:US13250777

    申请日:2011-09-30

    IPC分类号: G10L15/20

    CPC分类号: G10L15/20 G10L21/0208

    摘要: An audio signal generated by a device based on audio input from a user may be received. The audio signal may include at least a user audio portion that corresponds to one or more user utterances recorded by the device. A user speech model associated with the user may be accessed and a determination may be made background audio in the audio signal is below a defined threshold. In response to determining that the background audio in the audio signal is below the defined threshold, the accessed user speech model may be adapted based on the audio signal to generate an adapted user speech model that models speech characteristics of the user. Noise compensation may be performed on the received audio signal using the adapted user speech model to generate a filtered audio signal with reduced background audio compared to the received audio signal.

    Speech and noise models for speech recognition
    8.
    发明授权
    Speech and noise models for speech recognition 有权
    用于语音识别的语音和噪声模型

    公开(公告)号:US08234111B2

    公开(公告)日:2012-07-31

    申请号:US12814665

    申请日:2010-06-14

    IPC分类号: G10L15/20

    CPC分类号: G10L15/20 G10L21/0208

    摘要: An audio signal generated by a device based on audio input from a user may be received. The audio signal may include at least a user audio portion that corresponds to one or more user utterances recorded by the device. A user speech model associated with the user may be accessed and a determination may be made background audio in the audio signal is below a defined threshold. In response to determining that the background audio in the audio signal is below the defined threshold, the accessed user speech model may be adapted based on the audio signal to generate an adapted user speech model that models speech characteristics of the user. Noise compensation may be performed on the received audio signal using the adapted user speech model to generate a filtered audio signal with reduced background audio compared to the received audio signal.

    摘要翻译: 可以接收由基于来自用户的音频输入的设备生成的音频信号。 音频信号可以包括至少一个对应于由该设备记录的一个或多个用户话语的用户音频部分。 可以访问与用户相关联的用户语音模型,并且可以确定音频信号中的背景音频低于定义的阈值。 响应于确定音频信号中的背景音频低于定义的阈值,可以基于音频信号来调整所访问的用户语音模型,以生成对用户的语音特征进行建模的适配的用户语音模型。 可以使用适配的用户语音模型对所接收的音频信号执行噪声补偿,以生成与接收的音频信号相比具有降低的背景音频的滤波音频信号。

    GEOTAGGED ENVIRONMENTAL AUDIO FOR ENHANCED SPEECH RECOGNITION ACCURACY
    9.
    发明申请
    GEOTAGGED ENVIRONMENTAL AUDIO FOR ENHANCED SPEECH RECOGNITION ACCURACY 有权
    GEOTAGGED环境音频用于增强语音识别精度

    公开(公告)号:US20120022870A1

    公开(公告)日:2012-01-26

    申请号:US13250843

    申请日:2011-09-30

    IPC分类号: H04W64/00 G10L15/00

    CPC分类号: G10L21/0208 G10L15/20

    摘要: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for enhancing speech recognition accuracy. In one aspect, a method includes receiving geotagged audio signals that correspond to environmental audio recorded by multiple mobile devices in multiple geographic locations, receiving an audio signal that corresponds to an utterance recorded by a particular mobile device, determining a particular geographic location associated with the particular mobile device, generating a noise model for the particular geographic location using a subset of the geotagged audio signals, where noise compensation is performed on the audio signal that corresponds to the utterance using the noise model that has been generated for the particular geographic location.

    摘要翻译: 方法,系统和装置,包括在计算机存储介质上编码的计算机程序,用于增强语音识别精度。 一方面,一种方法包括接收对应于多个地理位置中的多个移动设备记录的环境音频的地理标记音频信号,接收对应于由特定移动设备记录的话语的音频信号,确定与该特定移动设备相关联的特定地理位置 特定的移动设备,使用所述地理标记的音频信号的子集来生成针对所述特定地理位置的噪声模型,其中使用对于所述特定地理位置生成的所述噪声模型对与所述话语相对应的所述音频信号执行噪声补偿。

    ACOUSTIC MODEL ADAPTATION USING GEOGRAPHIC INFORMATION
    10.
    发明申请
    ACOUSTIC MODEL ADAPTATION USING GEOGRAPHIC INFORMATION 有权
    使用地理信息的声学模型适应

    公开(公告)号:US20110295590A1

    公开(公告)日:2011-12-01

    申请号:US12787568

    申请日:2010-05-26

    IPC分类号: G06F17/20

    摘要: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for enhancing speech recognition accuracy. In one aspect, a method includes receiving an audio signal that corresponds to an utterance recorded by a mobile device, determining a geographic location associated with the mobile device, adapting one or more acoustic models for the geographic location, and performing speech recognition on the audio signal using the one or more acoustic models model that are adapted for the geographic location.

    摘要翻译: 方法,系统和装置,包括在计算机存储介质上编码的计算机程序,用于增强语音识别精度。 在一个方面,一种方法包括接收对应于由移动设备记录的话语的音频信号,确定与移动设备相关联的地理位置,调整用于地理位置的一个或多个声学模型,以及对该音频执行语音识别 使用适合于地理位置的一个或多个声学模型模型的信号。