Language model biasing modulation

    公开(公告)号:US09886946B2

    公开(公告)日:2018-02-06

    申请号:US15263714

    申请日:2016-09-13

    Applicant: Google Inc.

    CPC classification number: G10L15/07 G10L15/183 G10L15/197 G10L15/24

    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for modulating language model biasing. In some implementations, context data is received. A likely context associated with a user is determined based on at least a portion of the context data. One or more language model biasing parameters based at least on the likely context associated with the user is selected. A context confidence score associated with the likely context based on at least a portion of the context data is determined. One or more language model biasing parameters based at least on the context confidence score is adjusted. A baseline language model based at least on the one or more of the adjusted language model biasing parameters is biased. The baseline language model is provided for use by an automated speech recognizer (ASR).

    ENHANCED SPEECH ENDPOINTING
    2.
    发明申请

    公开(公告)号:US20180012591A1

    公开(公告)日:2018-01-11

    申请号:US15711260

    申请日:2017-09-21

    Applicant: Google Inc.

    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for receiving audio data including an utterance, obtaining context data that indicates one or more expected speech recognition results, determining an expected speech recognition result based on the context data, receiving an intermediate speech recognition result generated by a speech recognition engine, comparing the intermediate speech recognition result to the expected speech recognition result for the audio data based on the context data, determining whether the intermediate speech recognition result corresponds to the expected speech recognition result for the audio data based on the context data, and setting an end of speech condition and providing a final speech recognition result in response to determining the intermediate speech recognition result matches the expected speech recognition result, the final speech recognition result including the one or more expected speech recognition results indicated by the context data.

    ENHANCED SPEECH ENDPOINTING
    3.
    发明申请
    ENHANCED SPEECH ENDPOINTING 审中-公开
    增强语音终点

    公开(公告)号:US20170069309A1

    公开(公告)日:2017-03-09

    申请号:US15192431

    申请日:2016-06-24

    Applicant: Google Inc.

    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for receiving audio data including an utterance, obtaining context data that indicates one or more expected speech recognition results, determining an expected speech recognition result based on the context data, receiving an intermediate speech recognition result generated by a speech recognition engine, comparing the intermediate speech recognition result to the expected speech recognition result for the audio data based on the context data, determining whether the intermediate speech recognition result corresponds to the expected speech recognition result for the audio data based on the context data, and setting an end of speech condition and providing a final speech recognition result in response to determining the intermediate speech recognition result matches the expected speech recognition result, the final speech recognition result including the one or more expected speech recognition results indicated by the context data.

    Abstract translation: 方法,系统和装置,包括在计算机存储介质上编码的用于接收包括话语的音频数据的计算机程序,获得指示一个或多个预期语音识别结果的上下文数据,基于上下文数据确定预期语音识别结果, 接收由语音识别引擎产生的中间语音识别结果,根据上下文数据将中间语音识别结果与音频数据的预期语音识别结果进行比较,确定中间语音识别结果是否对应于预期语音识别结果 基于所述上下文数据的所述音频数据,以及响应于确定所述中间语音识别结果匹配所述预期语音识别结果而设置语音结束结束并提供最终语音识别结果,所述最终语音识别结果包括所述一个或多个预期的 语音识别 由上下文数据指示的结果。

    NEGATIVE N-GRAM BIASING
    4.
    发明申请
    NEGATIVE N-GRAM BIASING 有权
    负面的N-GRAM偏心

    公开(公告)号:US20160365092A1

    公开(公告)日:2016-12-15

    申请号:US14739287

    申请日:2015-06-15

    Applicant: Google Inc.

    CPC classification number: G10L15/197 G10L15/01 G10L2015/228

    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for performing dynamic, stroke-based alignment of touch displays. In one aspect, a method includes obtaining a candidate transcription that an automated speech recognizer generates for an utterance, determining a particular context associated with the utterance, determining that a particular n-gram that is included in the candidate transcription is included among a set of undesirable n-grams that is associated with the context, adjusting a speech recognition confidence score associated with the transcription based on determining that the particular n-gram that is included in the candidate transcription is included among the set of undesirable n-grams that is associated with the context, and determining whether to provide the candidate transcription for output based at least on the adjusted speech recognition confidence score.

    Abstract translation: 方法,系统和装置,包括在计算机存储介质上编码的计算机程序,用于执行触摸显示器的基于动画,笔画的对准。 在一个方面,一种方法包括获得自动语音识别器为语音产生的候选转录,确定与该话语相关联的特定上下文,确定包括在候选转录中的特定n-gram包括在一组 与上下文相关联的不期望的n克,基于确定包括在候选转录中的特定n-gram被包括在相关联的不期望的n克中,调整与转录相关联的语音识别置信度得分 并且基于至少基于所调整的语音识别置信度得分确定是否提供用于输出的候选转录。

    Dynamically biasing language models
    5.
    发明授权
    Dynamically biasing language models 有权
    动态偏好语言模型

    公开(公告)号:US09502032B2

    公开(公告)日:2016-11-22

    申请号:US14525826

    申请日:2014-10-28

    Applicant: Google Inc.

    CPC classification number: G10L15/22 G10L15/1815 G10L15/26 G10L15/32 G10L19/00

    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for speech recognition. In one aspect, a method comprises receiving audio data encoding one or more utterances; performing a first speech recognition on the audio data; identifying a context based on the first speech recognition; performing a second speech recognition on the audio data that is biased towards the context; and providing an output of the second speech recognition.

    Abstract translation: 方法,系统和装置,包括编码在计算机存储介质上的用于语音识别的计算机程序。 一方面,一种方法包括接收编码一个或多个话语的音频数据; 对音频数据执行第一语音识别; 基于第一语音识别识别语境; 对偏向于上下文的音频数据执行第二语音识别; 并提供第二语音识别的输出。

    VIDEO ANALYSIS BASED LANGUAGE MODEL ADAPTATION
    6.
    发明申请
    VIDEO ANALYSIS BASED LANGUAGE MODEL ADAPTATION 审中-公开
    基于视频分析的语言模式适应

    公开(公告)号:US20140379346A1

    公开(公告)日:2014-12-25

    申请号:US13923545

    申请日:2013-06-21

    Applicant: Google Inc.

    CPC classification number: G10L15/25 G06K9/00335 G06K9/726 G10L15/183 G10L15/24

    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for receiving audio data obtained by a microphone of a wearable computing device, wherein the audio data encodes a user utterance, receiving image data obtained by a camera of the wearable computing device, identifying one or more image features based on the image data, identifying one or more concepts based on the one or more image features, selecting one or more terms associated with a language model used by a speech recognizer to generate transcriptions, adjusting one or more probabilities associated with the language model that correspond to one or more of the selected terms based on the relevance of one or more of the selected terms to the one or more concepts, and obtaining a transcription of the user utterance using the speech recognizer.

    Abstract translation: 方法,系统和装置,包括在计算机存储介质上编码的计算机程序,用于接收由可穿戴计算装置的麦克风获得的音频数据,其中所述音频数据对用户发声进行编码,接收摄像机获得的可佩带的图像数据 计算设备,基于图像数据识别一个或多个图像特征,基于一个或多个图像特征识别一个或多个概念,选择与由语音识别器使用的语言模型相关联以产生转录的一个或多个词语,调整一个 或更多与根据一个或多个所选术语与所述一个或多个概念的相关性对应于一个或多个所选项的语言模型的概率,以及使用所述语音识别器获得所述用户话语的转录。

    LANGUAGE MODEL BIASING SYSTEM
    7.
    发明申请

    公开(公告)号:US20180233131A1

    公开(公告)日:2018-08-16

    申请号:US15432620

    申请日:2017-02-14

    Applicant: Google Inc.

    Abstract: Methods, systems, and apparatus for receiving audio data corresponding to a user utterance and context data, identifying an initial set of one or more n-grams from the context data, generating an expanded set of one or more n-grams based on the initial set of n-grams, adjusting a language model based at least on the expanded set of n-grams, determining one or more speech recognition candidates for at least a portion of the user utterance using the adjusted language model, adjusting a score for a particular speech recognition candidate determined to be included in the expanded set of n-grams, determining a transcription of user utterance that includes at least one of the one or more speech recognition candidates, and providing the transcription of the user utterance for output.

    NEGATIVE N-GRAM BIASING
    8.
    发明申请

    公开(公告)号:US20170270918A1

    公开(公告)日:2017-09-21

    申请号:US15605475

    申请日:2017-05-25

    Applicant: Google Inc.

    CPC classification number: G10L15/197 G10L15/01 G10L2015/228

    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for performing dynamic, stroke-based alignment of touch displays. In one aspect, a method includes obtaining a candidate transcription that an automated speech recognizer generates for an utterance, determining a particular context associated with the utterance, determining that a particular n-gram that is included in the candidate transcription is included among a set of undesirable n-grams that is associated with the context, adjusting a speech recognition confidence score associated with the transcription based on determining that the particular n-gram that is included in the candidate transcription is included among the set of undesirable n-grams that is associated with the context, and determining whether to provide the candidate transcription for output based at least on the adjusted speech recognition confidence score.

    VOICE RECOGNITION SYSTEM
    9.
    发明申请

    公开(公告)号:US20170193999A1

    公开(公告)日:2017-07-06

    申请号:US14989642

    申请日:2016-01-06

    Applicant: Google Inc.

    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for voice recognition. In one aspect, a method includes the actions of receiving a voice input; determining a transcription for the voice input, wherein determining the transcription for the voice input includes, for a plurality of segments of the voice input: obtaining a first candidate transcription for a first segment of the voice input; determining one or more contexts associated with the first candidate transcription; adjusting a respective weight for each of the one or more contexts; and determining a second candidate transcription for a second segment of the voice input based in part on the adjusted weights; and providing the transcription of the plurality of segments of the voice input for output.

    Speech Recognition With Selective Use Of Dynamic Language Models

    公开(公告)号:US20170186432A1

    公开(公告)日:2017-06-29

    申请号:US14982567

    申请日:2015-12-29

    Applicant: Google Inc.

    Abstract: This document describes, among other things, a computer-implemented method for transcribing an utterance. The method can include receiving, at a computing system, speech data that characterizes an utterance of a user. A first set of candidate transcriptions of the utterance can be generated using a static class-based language model that includes a plurality of classes that are each populated with class-based terms selected independently of the utterance or the user. The computing system can then determine whether the first set of candidate transcriptions includes class-based terms. Based on whether the first set of candidate transcriptions includes class-based terms, the computing system can determine whether to generate a dynamic class-based language model that includes at least one class that is populated with class-based terms selected based on a context associated with at least one of the utterance and the user.

Patent Agency Ranking