Patent search cpc:"G10L2015/221" Page 2

11.

发明申请
PREDICTING AND LEARNING CARRIER PHRASES FOR SPEECH INPUT 审中-公开
Title translation: 用于语音输入的预测和学习载波

公开(公告)号：WO2011156381A1

公开(公告)日：2011-12-15

申请号：PCT/US2011/039454

申请日：2011-06-07

Applicant: GOOGLE, INC. , BYRNE, William, J. , GRUENSTEIN, Alexander, H. , BEEFERMAN, Douglas

Inventor： BYRNE, William, J. , GRUENSTEIN, Alexander, H. , BEEFERMAN, Douglas

IPC: G10L15/00

CPC classification number: G10L15/22 , G06F3/04842 , G06F3/167 , G06F17/2795 , G10L15/02 , G10L15/063 , G10L15/1822 , G10L15/26 , G10L2015/0631 , G10L2015/0635 , G10L2015/0638 , G10L2015/221

Abstract: Predicting and learning users' intended actions on an electronic device based on free-form speech input. Users' actions can be monitored to develop of a list of carrier phrases having one or more actions that correspond to the carrier phrases. A user can speak a command into a device to initiate an action. The spoken command can be parsed and compared to a list of carrier phrases. If the spoken command matches one of the known carrier phrases, the corresponding action(s) can be presented to the user for selection. If the spoken command does not match one of the known carrier phrases, search results (e.g., Internet search results) corresponding to the spoken command can be presented to the user. The actions of the user in response to the presented action(s) and/or the search results can be monitored to update the list of carrier phrases.

Abstract translation: 基于自由形式语音输入，预测和学习用户对电子设备的预期动作。可以监视用户的动作以开发具有与运营商短语对应的一个或多个动作的运营商短语列表。用户可以在设备中发出命令以启动动作。口头命令可以被解析并与载体短语列表进行比较。如果口头命令与已知的运营商短语之一匹配，则相应的动作可以呈现给用户进行选择。如果口头命令与已知的运营商短语之一不匹配，则可以向用户呈现与口语命令对应的搜索结果（例如，因特网搜索结果）。可以监视用户响应于所呈现的动作和/或搜索结果的动作以更新运营商短语列表。

12.

发明申请
AUTOMATIC READING TUTORING WITH PARALLEL POLARIZED LANGUAGE MODELING 审中-公开
Title translation: 具有平行极化语言建模的自动阅读引导

公开(公告)号：WO2008089469A1

公开(公告)日：2008-07-24

申请号：PCT/US2008/051582

申请日：2008-01-21

Applicant: MICROSOFT CORPORATION

Inventor： LI, Xiaolong , JU, Yun-cheng , DENG, Li , ACERO, Alejandro

IPC: G06F17/28

CPC classification number: G06F17/271 , G09B17/003 , G10L15/197 , G10L2015/221

Abstract: A novel system for automatic reading tutoring provides effective error detection and reduced false alarms combined with low processing time burdens and response times short enough to maintain a natural, engaging flow of interaction. According to one illustrative embodiment, an automatic reading tutoring method includes displaying a text output and receiving an acoustic input. The acoustic input is modeled with a domain-specific target language model specific to the text output, and with a general-domain garbage language model, both of which may be efficiently constructed as context-free grammars. The domain-specific target language model may be built dynamically or "on-the-fly" based on the currently displayed text (eg the story to be read by the user), while the general-domain garbage language model is shared among all different text outputs. User-perceptible tutoring feedback is provided based on the target language model and the garbage language model.

Abstract translation: 用于自动阅读辅导的新颖系统提供了有效的错误检测和减少的假警报以及较短的处理时间负担和响应时间足够短以保持自然的，互动的互动流。根据一个说明性实施例，自动阅读辅导方法包括显示文本输出并接收声输入。声输入是用专门针对文本输出的领域特定的目标语言模型建立的，并且具有通用域垃圾语言模型，这两种语言模型都可以被有效地构建为无上下文的语法。可以基于当前显示的文本（例如，用户要阅读的故事）动态地或“即时”地构建域特定目标语言模型，而一般域垃圾语言模型在所有不同的方式之间共享文本输出。基于目标语言模型和垃圾语言模型提供了用户可感知的辅导反馈。

13.

发明申请
GLOBAL SPEECH USER INTERFACE 审中-公开
Title translation: 全球语音用户界面

公开(公告)号：WO2003030148A1

公开(公告)日：2003-04-10

申请号：PCT/US2002/031300

申请日：2002-10-01

Applicant: AGILE TV CORPORATION

Inventor： JORDAN, Adam , MADDUX, Scott, Lynn , PLOWMAN, Tim , STANBACH, Victoria , WILLIAMS, Jody

IPC: G10L15/00

CPC classification number: H04N21/47 , G06F3/16 , G06Q30/0271 , G06Q30/0631 , G10L13/00 , G10L15/22 , G10L21/06 , G10L2015/221 , G10L2015/223 , H04N5/44543 , H04N21/42203 , H04N21/4221 , H04N21/4316 , H04N21/4622 , H04N21/472 , H04N21/47202 , H04N21/47211 , H04N21/47214 , H04N21/475 , H04N21/478 , H04N21/4781 , H04N21/4782 , H04N21/4788 , H04N21/482 , H04N21/4826 , H04N21/4828 , H04N21/4852 , H04N21/812 , H04N21/8173

Abstract: A global speech user interface (GSUI) (100) comprises an input system (110) to receive a user's spoken command, a feedback system along with a set of feedback overlays to give the user information on the progress of his spoken requests, a set of visual cues on the television screen (140) to help the user understand what he can say, a help system, and a model for navigation among applications. The interface is extensible to make it easy to add new applications.

Abstract translation: 全球语音用户界面（GSUI）（100）包括用于接收用户的口语命令的输入系统（110），反馈系统以及一组反馈覆盖，以向用户提供关于他的口语请求进度的信息，一组在电视屏幕（140）上的视觉提示以帮助用户了解他可以说什么，帮助系统以及应用之间的导航模型。该界面是可扩展的，以便于添加新的应用程序。

14.

发明申请
CORRECTING A TEXT RECOGNIZED BY SPEECH RECOGNITION THROUGH COMPARISON OF PHONETIC SEQUENCES IN THE RECOGNIZED TEXT WITH A PHONETIC TRANSCRIPTION OF A MANUALLY INPUT CORRECTION WORD 审中-公开
Title translation: 通过语音识别识别语音识别通过手机输入校正字的电话转换识别文本中的文字识别文本

公开(公告)号：WO2003025904A1

公开(公告)日：2003-03-27

申请号：PCT/IB2002/003688

申请日：2002-09-10

Applicant: KONINKLIJKE PHILIPS ELECTRONICS N.V.

Inventor： GSCHWENDTNER, Wolfgang

IPC: G10L15/26

CPC classification number: G10L15/08 , G10L2015/025 , G10L2015/221 , G10L2015/225

Abstract: A correction device (4) for a speech recognition device (2) is provided, with which the replacement of incorrectly recognized words (FETI) of the recognized text (ETI) is especially simple to execute. The correction device (4) is based on the recognition that the phoneme sequences of incorrectly recognized words and the spoken words actually to be recognized are very similar, and automatically marks words in the recognized text (ETI) which show a phoneme sequence similar to that of a correction word (KWI) put in manually by the user.

Abstract translation: 提供了一种用于语音识别装置（2）的校正装置（4），其中识别的文本（ETI）的不正确识别的字（FETI）的替换特别容易执行。校正装置（4）基于识别出不正确识别的单词和实际要被识别的口语单词的音素序列非常相似，并且自动地标记识别文本（ETI）中的单词，其显示类似于该音素序列的音素序列由用户手动放置的校正字（KWI）。

15.

发明申请
SYSTEM AND METHOD FOR EYE-TRACKING CONTROLLED SPEECH PROCESSING WITH GENERATION OF A VISUAL FEEDBACK SIGNAL 审中-公开
Title translation: 对于视图处理系统和方法FOCUSED语音制作视觉反馈信号

公开(公告)号：WO01056018A1

公开(公告)日：2001-08-02

申请号：PCT/DE2001/000138

申请日：2001-01-15

IPC: G06F3/00 , G06F3/038 , G06F3/16 , G10L15/26 , G10L21/00

CPC classification number: G10L15/22 , G06F3/038 , G06F3/167 , G06F2203/0381 , G10L2015/221

Abstract: The invention relates to a system and a method for the operation and monitoring of, in particular, an automation system and/or a production machine or machine tool, whereby the visual field (9), of a user (1), is recorded on at least one means of display (2), where the speech information (8), from the user (1), is at least intermittently determined and where a visual feedback signal is generated, in response to the processing status, with regard to recognised voice information (8). An improved speech interaction is thus obtained, in particular, in the field of augmented-reality applications and in complex technical plants.

Abstract translation: 本发明涉及一种系统和用于在特定的自动化系统和/或生产和/或机床的操作和监视的方法，其中用户的（1）上的至少一个显示装置检测到的视场（9）（2），其特征在于用户（1）的语音信息（8）被至少偶尔评估，并且其中响应于所述处理状态关于检测到的语音信息（8）是在生成视觉反馈信号。通过这种方式，改进的语音交互出现特别的甚至对复杂的技术系统的增强现实应用领域。

16.

发明申请
情報処理装置及び情報処理方法审中-公开
Title translation: 信息处理装置和信息处理方法

公开(公告)号：WO2018037956A1

公开(公告)日：2018-03-01

申请号：PCT/JP2017/029255

申请日：2017-08-14

Applicant: ソニー株式会社

Inventor： 滝　祐平 , 河野　真一

IPC: G10L15/22 , G10L15/28

CPC classification number: G10L15/22 , G10L15/02 , G10L15/30 , G10L2015/221 , G10L2015/228

Abstract: 本技術は、所望の音声認識結果を容易に得ることができるようにする情報処理装置及び情報処理方法に関する。情報処理装置は、音声認識に関するコンテキストに基づいて、音声認識の認識結果を提示するときの区切りを制御する提示制御部を備える。本技術は、例えば、音声認識を単独で行う情報処理装置、クライアントからの要求に応じて音声認識を行い、認識結果をクライアントに送信するサーバ、又は、音声認識をサーバに依頼し、認識結果をサーバから受信して提示するクライアントに適用できる。

Abstract translation: 本技术涉及能够容易地获得期望的语音识别结果的信息处理装置和信息处理方法。该信息处理装置包括呈现控制单元，该呈现控制单元在基于与语音识别相关的上下文呈现语音识别的识别结果时控制划界。这种技术，例如，用于单独执行语音识别的信息处理装置中，响应于来自客户端的请求而执行语音识别，服务器发送的识别结果到客户端，或请求语音识别服务器，识别结果它可以应用于从服务器接收和呈现的客户端。

17.

发明申请
CONVERSATIONAL SOFTWARE AGENT 审中-公开
Title translation: 对话软件代理

公开(公告)号：WO2017151406A1

公开(公告)日：2017-09-08

申请号：PCT/US2017/019232

申请日：2017-02-24

Applicant: MICROSOFT TECHNOLOGY LICENSING, LLC

Inventor： FROELICH, Raymond J.

IPC: G10L15/22 , G10L15/08 , G10L15/19 , G10L15/183 , G10L25/78

CPC classification number: G10L15/22 , G10L15/02 , G10L15/063 , G10L25/87 , G10L2015/0635 , G10L2015/221 , G10L2015/225 , G10L2025/783 , H04N7/157

Abstract: Voice input is received from a user. An ASR system generates in memory a set of words it has identified in the voice input, and update the set each time it identifies a new word in the voice input to add the new word to the set. A condition indicative of speech inactivity in the voice input is detected. A response for outputting to the user is generated based on the set of identified words, in response to the detection of the speech inactivity condition. The generated response is outputted to the user after an interval of time - commencing with the detection of the speech inactivity condition - has ended and only if no more words have been identified in the voice input by the ASR system in that interval of time.

Abstract translation:
从用户接收语音输入。 ASR系统在内存中生成一组它在语音输入中识别的单词，并在每次识别语音输入中的新单词时更新该单词，以将新单词添加到该单词集中。检测到表示语音输入中的语音不活动的状况。响应于检测到语音不活动条件，基于该组识别的词生成向用户输出的响应。在检测到语音不活动状态的时间间隔结束后，并且只有当在该时间间隔内ASR系统的语音输入中没有识别出更多的单词时，才将产生的响应输出给用户。 / p>

18.

发明申请
REDUCED LATENCY SPEECH RECOGNITION SYSTEM USING MULTIPLE RECOGNIZERS 审中-公开
Title translation: 使用多个识别器的减少语音识别系统

公开(公告)号：WO2017014721A1

公开(公告)日：2017-01-26

申请号：PCT/US2015/040905

申请日：2015-07-17

Applicant: NUANCE COMMUNICATIONS, INC.

Inventor： WILLETT, Daniel , GOLLAN, Christian , QUILLEN, Carl, Benjamin , HAHN, Stefan , STEMMER, Fabian

IPC: G10L21/00

CPC classification number: G10L15/30 , G10L2015/221

Abstract: Method and apparatus for providing visual feedback on an electronic device in a client/server speech recognition system comprising the electronic device and a network device remotely located from the electronic device. The method comprises processing, by an embedded speech recognizer of the electronic device, at least a portion of input audio comprising speech to produce local recognized speech, sending at least a portion of the input audio to the network device for remote speech recognition, and displaying, on a user interface of the electronic device, visual feedback based on at least a portion of the local recognized speech prior to receiving streaming recognition results from the network device.

Abstract translation: 一种用于在客户端/服务器语音识别系统中的电子设备上提供视觉反馈的方法和装置，包括电子设备和远离电子设备的网络设备。该方法包括通过电子设备的嵌入式语音识别器处理包括语音的输入音频的至少一部分以产生本地识别的语音，将至少一部分输入音频发送到网络设备进行远程语音识别，以及显示在所述电子设备的用户界面上，在从所述网络设备接收流识别结果之前，基于所述本地识别语音的至少一部分的视觉反馈。

19.

发明申请
タグ付与支援装置、方法およびプログラム审中-公开
Title translation: 标签支持设备，方法和程序

公开(公告)号：WO2016151692A1

公开(公告)日：2016-09-29

申请号：PCT/JP2015/058544

申请日：2015-03-20

Applicant: 株式会社東芝

Inventor： 伊藤　雅弘 , 岩田　憲治

IPC: G06F17/27

CPC classification number: G10L15/22 , G06F17/2785 , G06F17/30976 , G10L15/1815 , G10L15/1822 , G10L2015/221 , G10L2015/223

Abstract: 短時間かつ正確なタグ付与を支援できる。本実施形態に係るタグ付与支援装置は、第１取得部、推定部、第１格納部、第２取得部および提示部。第１取得部は、ユーザの発話に関する発話文を取得する。推定部は、前記発話文の発話意図を推定し、該発話意図の候補となる１以上の意図候補を得る。第１格納部は、対話システムで用いられる意図を階層構造で表現した意図体系を格納する。第２取得部は、前記１以上の意図候補に基づいて、前記意図体系の一部を１以上の階層意図候補として取得する。提示部は、前記１以上の階層意図候補を提示する。

Abstract translation: 为了支持快速准确的标签，该标签支持装置设置有第一获取单元，推理单元，第一存储单元，第二获取单元和呈现单元。第一采集单元获取与用户话语有关的话语文本。推理单元推断话语文本的发音意图，并获取作为话语意图的候选者的一个或多个意图候选者。第一存储单元存储以层次结构表示对话系统中使用的意图的意图系统。基于一个或多个意图候选，第二获取单元将意图系统的一部分获取为一个或多个分层意图候选。呈现单元呈现前述的一个或多个分层意图候选。

20.

发明申请
TRANSCRIPTION CORRECTION USING MULTI-TOKEN STRUCTURES 审中-公开
Title translation: 使用多克隆结构进行转录校正

公开(公告)号：WO2016122967A1

公开(公告)日：2016-08-04

申请号：PCT/US2016/014411

申请日：2016-01-22

Applicant: MICROSOFT TECHNOLOGY LICENSING, LLC

Inventor： LEVIT, Michael , OZERTEM, Umut , PARTHASARATHY, Sarangarajan , VARADHARAJAN, Padma , RAGHUNATHAN, Karthik , ALPHONSO, Issac

IPC: G10L15/08 , G10L15/22 , G10L15/187 , G10L15/197

CPC classification number: G06F17/277 , G06F17/28 , G10L15/01 , G10L15/02 , G10L15/083 , G10L15/187 , G10L15/197 , G10L15/22 , G10L15/30 , G10L25/33 , G10L2015/221

Abstract: Examples of the present disclosure describe generation of a multi-arc confusion network to improve, for example, an ability to return alternatives to output generated. A confusion network comprising token representations of lexicalized hypotheses and normalized hypotheses is generated. Each arc of the confusion network represents a token of a lexicalized hypothesis or a normalized hypothesis. The confusion network is transformed into a multi-arc confusion network, wherein the transforming comprising realigning at least one token of the confusion network to span multiple arcs of the confusion network. Other examples are also described.

Abstract translation: 本公开的示例描述了生成多弧混淆网络以改善例如返回产生的输出的替代物的能力。产生包含词汇化假设和归一化假设的令牌表示的混淆网络。混乱网络的每个弧代表一个词汇化假说或一个归一化假设的标志。混淆网络被转换成多弧混淆网络，其中变换包括重新排列混合网络的至少一个令牌以跨越混淆网络的多个弧。还描述了其它实例。

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification