Speech recognition using loosely coupled components

    公开(公告)号:US09208786B2

    公开(公告)日:2015-12-08

    申请号:US14636774

    申请日:2015-03-03

    申请人: MModal IP LLC

    IPC分类号: G10L15/00 G10L15/30 G10L15/22

    摘要: An automatic speech recognition system includes an audio capture component, a speech recognition processing component, and a result processing component which are distributed among two or more logical devices and/or two or more physical devices. In particular, the audio capture component may be located on a different logical device and/or physical device from the result processing component. For example, the audio capture component may be on a computer connected to a microphone into which a user speaks, while the result processing component may be on a terminal server which receives speech recognition results from a speech recognition processing server.

    Content-based audio playback emphasis
    5.
    发明授权
    Content-based audio playback emphasis 有权
    基于内容的音频播放强调

    公开(公告)号:US09135917B2

    公开(公告)日:2015-09-15

    申请号:US14317873

    申请日:2014-06-27

    申请人: MModal IP LLC

    摘要: Techniques are disclosed for facilitating the process of proofreading draft transcripts of spoken audio streams. In general, proofreading of a draft transcript is facilitated by playing back the corresponding spoken audio stream with an emphasis on those regions in the audio stream that are highly relevant or likely to have been transcribed incorrectly. Regions may be emphasized by, for example, playing them back more slowly than regions that are of low relevance and likely to have been transcribed correctly. Emphasizing those regions of the audio stream that are most important to transcribe correctly and those regions that are most likely to have been transcribed incorrectly increases the likelihood that the proofreader will accurately correct any errors in those regions, thereby improving the overall accuracy of the transcript.

    摘要翻译: 公开了用于促进校对口头音频流的草稿的过程的技术。 一般来说,通过播放对应的口语音频流,强调音频流中与那些高度相关或可能被错误地转录的那些区域,来校对草稿。 例如,区域可能会被强调为比相关程度低且可能被正确转录的地区的播放速度更慢。 强调音频流中最重要的那些区域是正确转录的,那些最有可能被错误转录的区域增加了校对者准确地纠正这些区域中的任何错误的可能性,从而提高了抄本的整体准确性。

    Speech Recognition Using Loosely Coupled Components
    6.
    发明申请
    Speech Recognition Using Loosely Coupled Components 有权
    使用松散耦合组件的语音识别

    公开(公告)号:US20150179172A1

    公开(公告)日:2015-06-25

    申请号:US14636774

    申请日:2015-03-03

    申请人: MModal IP LLC

    IPC分类号: G10L15/22

    摘要: An automatic speech recognition system includes an audio capture component, a speech recognition processing component, and a result processing component which are distributed among two or more logical devices and/or two or more physical devices. In particular, the audio capture component may be located on a different logical device and/or physical device from the result processing component. For example, the audio capture component may be on a computer connected to a microphone into which a user speaks, while the result processing component may be on a terminal server which receives speech recognition results from a speech recognition processing server.

    摘要翻译: 自动语音识别系统包括分布在两个或多个逻辑设备和/或两个或更多个物理设备之间的音频捕获组件,语音识别处理组件和结果处理组件。 特别地,音频捕获组件可以位于与结果处理组件不同的逻辑设备和/或物理设备上。 例如,音频捕获组件可以在与用户说话的麦克风连接的计算机上,而结果处理组件可以在从语音识别处理服务器接收语音识别结果的终端服务器上。

    Verification of Extracted Data
    7.
    发明申请
    Verification of Extracted Data 有权
    提取数据验证

    公开(公告)号:US20130346074A1

    公开(公告)日:2013-12-26

    申请号:US13975833

    申请日:2013-08-26

    申请人: MModal IP LLC

    IPC分类号: G10L19/00

    摘要: Facts are extracted from speech and recorded in a document using codings. Each coding represents an extracted fact and includes a code and a datum. The code may represent a type of the extracted fact and the datum may represent a value of the extracted fact. The datum in a coding is rendered based on a specified feature of the coding. For example, the datum may be rendered as boldface text to indicate that the coding has been designated as an “allergy.” In this way, the specified feature of the coding (e.g., “allergy”-ness) is used to modify the manner in which the datum is rendered. A user inspects the rendering and provides, based on the rendering, an indication of whether the coding was accurately designated as having the specified feature. A record of the user's indication may be stored, such as within the coding itself.

    摘要翻译: 事实是从言语中提取的,并使用编码记录在文档中。 每个编码表示提取的事实,并包括代码和基准。 代码可以表示提取的事实的类型,并且数据可以表示提取的事实的值。 编码中的数据基于编码的指定特征进行渲染。 例如,数据可以呈现为粗体文本,以指示编码已被指定为“过敏”。 以这种方式,编码的指定特征(例如,“过敏”)用于修改基准的呈现方式。 用户检查呈现并基于呈现提供编码是否被准确地指定为具有指定特征的指示。 可以存储用户指示的记录,例如在编码本身内。

    Speech Recognition Using an Operating System Hooking Component for Context-Aware Recognition Models
    9.
    发明申请
    Speech Recognition Using an Operating System Hooking Component for Context-Aware Recognition Models 审中-公开
    语音识别使用操作系统挂钩组件进行上下文感知识别模型

    公开(公告)号:US20170047061A1

    公开(公告)日:2017-02-16

    申请号:US15334523

    申请日:2016-10-26

    申请人: MModal IP LLC

    摘要: Inputs provided into user interface elements of an application are observed. Records are made of the inputs and the state(s) the application was in while the inputs were provided. For each state, a corresponding language model is trained based on the input(s) provided to the application while the application was in that state. When the application is next observed to be in a previously-observed state, a language model associated with the application's current state is applied to recognize speech input provided by a user and thereby to generate speech recognition output that is provided to the application. An application's state at a particular time may include the user interface element(s) that are displayed and/or in focus at that time, and is determined by an operating system hooking component embedded in the automatic speech recognition system.

    摘要翻译: 观察到提供给应用程序的用户界面元素的输入。 记录由输入和应用程序在提供输入时所处的状态组成。 对于每个状态,在应用程序处于该状态时,基于提供给应用程序的输入来对相应的语言模型进行训练。 当应用程序接下来观察到处于先前观察到的状态时,应用与应用程序的当前状态相关联的语言模型来识别由用户提供的语音输入,从而生成提供给应用的语音识别输出。 在特定时间的应用程序的状态可以包括当时显示和/或聚焦的用户界面元素,并且由嵌入在自动语音识别系统中的操作系统挂钩组件确定。

    Document transcription system training
    10.
    发明授权
    Document transcription system training 有权
    文件转录系统培训

    公开(公告)号:US09552809B2

    公开(公告)日:2017-01-24

    申请号:US15066677

    申请日:2016-03-10

    申请人: MModal IP LLC

    摘要: A system is provided for training an acoustic model for use in speech recognition. In particular, such a system may be used to perform training based on a spoken audio stream and a non-literal transcript of the spoken audio stream. Such a system may identify text in the non-literal transcript which represents concepts having multiple spoken forms. The system may attempt to identify the actual spoken form in the audio stream which produced the corresponding text in the non-literal transcript, and thereby produce a revised transcript which more accurately represents the spoken audio stream. The revised, and more accurate, transcript may be used to train the acoustic model, thereby producing a better acoustic model than that which would be produced using conventional techniques, which perform training based directly on the original non-literal transcript.

    摘要翻译: 提供用于训练用于语音识别的声学模型的系统。 特别地,这样的系统可以用于基于口语音频流和口头音频流的非文字转录来执行训练。 这样的系统可以识别表示具有多个口头形式的概念的非文字记录中的文本。 该系统可以尝试在音频流中识别在非文字转录中产生相应文本的音频流中的实际语音形式,从而产生更准确地表示语音音频流的经修改的脚本。 修改和更准确的誊本可用于训练声学模型,从而产生比使用直接基于原始非文字誊本进行训练的常规技术产生的更好的声学模型。