专利检索 ap:("Xuedong D. Huang" OR "Zicheng Liu" OR "Zhengyou Zhang" OR "Michael J. Sinclair" OR "Alejandro Acero") AND inv:"Xuedong D. Huang" 第 1 页

1.

发明授权
Multi-sensory speech detection system 失效
标题翻译：多感官语音检测系统

公开(公告)号：US07383181B2

公开(公告)日：2008-06-03

申请号：US10629278

申请日：2003-07-29

申请人： Xuedong D. Huang , Zicheng Liu , Zhengyou Zhang , Michael J. Sinclair , Alejandro Acero

发明人： Xuedong D. Huang , Zicheng Liu , Zhengyou Zhang , Michael J. Sinclair , Alejandro Acero

IPC分类号： G10L15/00

CPC分类号： H04R1/10 , G10L15/20 , G10L15/24 , G10L25/78 , H04R1/14 , H04R25/606

摘要： The present invention combines a conventional audio microphone with an additional speech sensor that provides a speech sensor signal based on an input. The speech sensor signal is generated based on an action undertaken by a speaker during speech, such as facial movement, bone vibration, throat vibration, throat impedance changes, etc. A speech detector component receives an input from the speech sensor and outputs a speech detection signal indicative of whether a user is speaking. The speech detector generates the speech detection signal based on the microphone signal and the speech sensor signal.

摘要翻译： 本发明将常规音频麦克风与基于输入提供语音传感器信号的附加话音传感器组合。语音传感器信号基于语音中的扬声器在诸如面部运动，骨骼振动，喉部振动，喉部阻抗变化等中的动作而产生。语音检测器组件从语音传感器接收输入并输出语音检测指示用户是否正在说话的信号。语音检测器基于麦克风信号和语音传感器信号产生语音检测信号。

2.

发明授权
Method and apparatus for multi-sensory speech enhancement 有权
标题翻译：多感官语音增强的方法和装置

公开(公告)号：US07447630B2

公开(公告)日：2008-11-04

申请号：US10724008

申请日：2003-11-26

申请人： Zicheng Liu , Michael J. Sinclair , Alejandro Acero , Xuedong D. Huang , James G. Droppo , Li Deng , Zhengyou Zhang , Yanli Zheng

发明人： Zicheng Liu , Michael J. Sinclair , Alejandro Acero , Xuedong D. Huang , James G. Droppo , Li Deng , Zhengyou Zhang , Yanli Zheng

IPC分类号： G10L21/02

CPC分类号： G10L21/0208 , G10L2021/02165

摘要： A method and system use an alternative sensor signal received from a sensor other than an air conduction microphone to estimate a clean speech value. The estimation uses either the alternative sensor signal alone, or in conjunction with the air conduction microphone signal. The clean speech value is estimated without using a model trained from noisy training data collected from an air conduction microphone. Under one embodiment, correction vectors are added to a vector formed from the alternative sensor signal in order to form a filter, which is applied to the air conductive microphone signal to produce the clean speech estimate. In other embodiments, the pitch of a speech signal is determined from the alternative sensor signal and is used to decompose an air conduction microphone signal. The decomposed signal is then used to determine a clean signal estimate.

摘要翻译： 一种方法和系统使用从除空气传导麦克风以外的传感器接收的替代传感器信号来估计干净的语音值。该估计单独使用替代传感器信号，或者与导气麦克风信号一起使用。无需使用从空气传导麦克风收集的噪声训练数据训练的模型来估计干净的语音值。在一个实施例中，校正矢量被添加到由替代传感器信号形成的矢量中，以形成滤波器，该滤波器被施加到空气传导麦克风信号以产生干净的语音估计。在其他实施例中，语音信号的音调由替代传感器信号确定，并用于分解空气传导麦克风信号。然后使用分解的信号来确定干净的信号估计。

3.

发明授权
Method and system of runtime acoustic unit selection for speech synthesis 失效
标题翻译：用于语音合成的运行时音单元选择的方法和系统

公开(公告)号：US5913193A

公开(公告)日：1999-06-15

申请号：US648808

申请日：1996-04-30

申请人： Xuedong D. Huang , Michael D. Plumpe , Alejandro Acero , James L. Adcock

发明人： Xuedong D. Huang , Michael D. Plumpe , Alejandro Acero , James L. Adcock

IPC分类号： G06F3/16 , G10L13/06 , G10L13/08 , G10L5/02 , G10L9/00

CPC分类号： G10L13/07

摘要： The present invention pertains to a concatenative speech synthesis system and method which produces a more natural sounding speech. The system provides for multiple instances of each acoustic unit which can be used to generate a speech waveform representing an linguistic expression. The multiple instances are formed during an analysis or training phase of the synthesis process and are limited to a robust representation of the highest probability instances. The provision of multiple instances enables the synthesizer to select the instance which closely resembles the desired instance thereby eliminating the need to alter the stored instance to match the desired instance. This in essence minimizes the spectral distortion between the boundaries of adjacent instances thereby producing more natural sounding speech.

摘要翻译： 本发明涉及一种产生更自然的声音语音的级联语音合成系统和方法。该系统提供每个声学单元的多个实例，其可用于生成表示语言表达式的语音波形。多个实例在合成过程的分析或训练阶段期间形成，并且被限制为最高概率实例的鲁棒表示。提供多个实例使得合成器能够选择非常类似于期望实例的实例，从而消除了改变存储的实例以匹配所需实例的需要。这实质上使相邻实例的边界之间的频谱失真最小化，从而产生更自然的声音语音。

4.

发明授权
Text-to-speech using clustered context-dependent phoneme-based units 失效
标题翻译：使用基于上下文的基于音素的单元的文本到语音

公开(公告)号：US6163769A

公开(公告)日：2000-12-19

申请号：US949138

申请日：1997-10-02

申请人： Alejandro Acero , Hsiao-Wuen Hon , Xuedong D. Huang

发明人： Alejandro Acero , Hsiao-Wuen Hon , Xuedong D. Huang

IPC分类号： G10L13/06 , G10L13/00

CPC分类号： G10L13/07

摘要： A text-to-speech system includes a storage device for storing a clustered set of context-dependent phoneme-based units of a target speaker. In one embodiment, decision trees are used wherein each decision tree based context-dependent phoneme-based unit is arranged based on context of at least one immediately preceding and succeeding phoneme. At least one of the context-dependent phoneme-based units represents other non-stored context-dependent phoneme units of similar sound due to similar contexts. A text analyzer obtains a string of phonetic symbols representative of text to be converted to speech. A concatenation module selects stored decision tree based context-dependent phoneme-based units from the set decision tree based context-dependent phoneme-based units based on the context of the phonetic symbols and synthesizes the selected phoneme-based units to generate speech corresponding to the text.

摘要翻译： 文本到语音系统包括用于存储目标说话者的基于上下文的基于音素的单元的聚集集合的存储设备。在一个实施例中，使用决策树，其中基于上下文的基于音素的单元的每个基于决策树的单元基于至少一个紧接在前和后面的音素的上下文来排列。基于上下文的基于音素的单元中的至少一个单元表示由于类似的上下文而具有类似声音的其他未存储的上下文相关音素单元。文本分析器获得代表要转换为语音的文本的语音符号串。级联模块基于语音符号的上下文从基于上下文的基于音素的单元中选择存储的基于决策树的基于上下文的基于音素的基于单元的基于上下文的基于音素的单元，并且合成所选择的基于音素的单元以产生对应于文本。

5.

发明授权
Architecture for user- and context-specific prefetching and caching of information on portable devices 有权
标题翻译：用于便携式设备上的用户和上下文相关预取和缓存信息的体系结构

公开(公告)号：US08626136B2

公开(公告)日：2014-01-07

申请号：US11427755

申请日：2006-06-29

申请人： Raymond E. Ozzie , Eric J. Horvitz , William H. Gates, III , Joshua T. Goodman , Susan T. Dumais , Gary W. Flake , Trenholme J. Griffin , Xuedong D. Huang , Oliver Hurst-Hiller , Christopher A Meek

发明人： Raymond E. Ozzie , Eric J. Horvitz , William H. Gates, III , Joshua T. Goodman , Susan T. Dumais , Gary W. Flake , Trenholme J. Griffin , Xuedong D. Huang , Oliver Hurst-Hiller , Christopher A Meek

IPC分类号： H04M3/42 , H04L29/06 , H04W74/00 , G06F15/16 , G06F3/033

CPC分类号： G06F17/30867 , H04W4/02 , H04W4/18

摘要： Content management architecture for a portable wireless device. Caching and fetching techniques are provided to improve content handling for portable devices such as cellular telephones and portable computers. A search component automatically performs searches as a background process, and potentially desired content is received and cached by a content storing component to be available in the future when and if needed, mitigating latency associated with slow download speeds, refresh rates, and other system and/or network impediments. Content from background search results can be trickled into the device as part of the background process so as not to burden system resources for other processes. As part of memory management, aged and/or low priority or low interest content can be selectively removed or archived to increase available cache or memory space, as well as to maintain relevant content within the device. A presentation component facilitates presentation of the pre-stored content.

摘要翻译： 便携式无线设备的内容管理架构。提供缓存和提取技术以改进便携式设备（例如蜂窝电话和便携式计算机）的内容处理。搜索组件自动执行搜索作为后台进程，并且可能期望的内容被内容存储组件接收和缓存，以便将来在需要时可用，减轻与慢下载速度，刷新率和其他系统相关联的延迟，以及 /或网络障碍。来自后台搜索结果的内容可以作为后台进程的一部分进入设备，以免对其他进程造成系统资源的负担。作为内存管理的一部分，老化和/或低优先级或低兴趣内容可以被选择性地删除或归档以增加可用的高速缓存或存储器空间，以及维护设备内的相关内容。演示组件便于显示预存的内容。

6.

发明申请
FORCE-FEEDBACK WITHIN TELEPRESENCE 有权
标题翻译：电报中的反馈

公开(公告)号：US20100306647A1

公开(公告)日：2010-12-02

申请号：US12472579

申请日：2009-05-27

申请人： Zhengyon Zhang , Xuedong D. Huang , Jin Li , Rajesh Kutpadi Hegde , Kori Marie Quinn , Michel Pahud , Jayman Dalal

发明人： Zhengyon Zhang , Xuedong D. Huang , Jin Li , Rajesh Kutpadi Hegde , Kori Marie Quinn , Michel Pahud , Jayman Dalal

IPC分类号： G06F3/01 , G06F3/048

CPC分类号： G06F3/016

摘要： The claimed subject matter provides a system and/or a method that facilitates replicating a telepresence session with a real world physical meeting. A telepresence session can be initiated within a communication framework that includes two or more virtually represented users that communicate therein. A trigger component can monitor the telepresence session in real time to identify a participant interaction with an object, wherein the object is at least one of a real world physical object or a virtually represented object within the telepresence session. A feedback component can implement a force feedback to at least one participant within the telepresence session based upon the identified participant interaction with the object, wherein the force feedback is employed via a device associated with at least one participant.

摘要翻译： 所要求保护的主题提供了一种有助于利用真实世界物理会议复制远程呈现会话的系统和/或方法。可以在通信框架内启动远程呈现会话，该通信框架包括在其中通信的两个或更多虚拟表示的用户。触发组件可以实时地监视远程呈现会话，以识别与对象的参与者交互，其中对象是远程呈现会话中的真实世界物理对象或虚拟表示对象中的至少一个。基于所识别的参与者与对象的交互，反馈组件可以向远程呈现会话中的至少一个参与者实施强制反馈，其中通过与至少一个参与者相关联的设备来采用力反馈。

7.

发明授权
Entity-specific search model 有权
标题翻译：实体特定搜索模型

公开(公告)号：US07822762B2

公开(公告)日：2010-10-26

申请号：US11427311

申请日：2006-06-28

申请人： Christopher D. Payne , Eric J. Horvitz , Alexander G. Gounares , Susan T. Dumais , Kyle G. Peltonen , Gary W. Flake , Xuedong D. Huang , William H. Gates, III , John C. Platt , Oliver Hurst-Hiller , Joshua T. Goodman , Christopher A. Meek , Ramez Naam , Raymond E Ozzie , Eric D. Brill

发明人： Christopher D. Payne , Eric J. Horvitz , Alexander G. Gounares , Susan T. Dumais , Kyle G. Peltonen , Gary W. Flake , Xuedong D. Huang , William H. Gates, III , John C. Platt , Oliver Hurst-Hiller , Joshua T. Goodman , Christopher A. Meek , Ramez Naam , Raymond E Ozzie , Eric D. Brill

IPC分类号： G06F7/001

CPC分类号： G06F17/30967 , G06F17/30964

摘要： A system that employs an explicitly and/or implicitly trained model in order to return entity-specific computer-based search results is provided. The innovation can provide for a customized search model that focuses search in connection with achieving information that is meaningful with respect to goals of an entity. The model can be used to modify a search query in accordance with a goal of the entity or to generate the search query thereby returning meaningful and/or targeted results to the user. The system can automatically gather entity-related data thereafter determining or inferring a goal as well as training the model. Moreover, the system can selectively configure (e.g., order, rank, filter) and render results to a user based upon the model.

摘要翻译： 提供了一种采用明确和/或隐含训练的模型以返回基于实体的基于计算机的搜索结果的系统。该创新可以提供定制的搜索模型，其将搜索重点与获得关于实体目标有意义的信息相关联。该模型可以用于根据实体的目标修改搜索查询，或者生成搜索查询，从而向用户返回有意义和/或有针对性的结果。系统可以自动收集与实体有关的数据，然后确定或推断目标以及训练模型。此外，系统可以基于模型选择性地配置（例如，排序，排序，过滤）并将结果呈现给用户。

8.

发明授权
Use of a unified language model 失效
标题翻译：使用统一的语言模型

公开(公告)号：US07013265B2

公开(公告)日：2006-03-14

申请号：US11003121

申请日：2004-12-03

申请人： Xuedong D. Huang , Milind V. Mahajan , Ye-Yi Wang , Xiaolong Mou

发明人： Xuedong D. Huang , Milind V. Mahajan , Ye-Yi Wang , Xiaolong Mou

IPC分类号： G06F17/27 , G10L15/18 , G10L11/00

CPC分类号： G10L15/193 , G10L15/197

摘要： A language processing system includes a unified language model. The unified language model comprises a plurality of context-free grammars having non-terminal tokens representing semantic or syntactic concepts and terminals, and an N-gram language model having non-terminal tokens. A language processing module capable of receiving an input signal indicative of language accesses the unified language model to recognize the language. The language processing module generates hypotheses for the received language as a function of words of the unified language model and/or provides an output signal indicative of the language and at least some of the semantic or syntactic concepts contained therein.

摘要翻译： 语言处理系统包括统一的语言模型。统一语言模型包括具有表示语义或句法概念和终端的非终端令牌的多个无上下文语法，以及具有非终端令牌的N-gram语言模型。能够接收指示语言的输入信号的语言处理模块访问统一语言模型以识别语言。语言处理模块根据统一语言模型的单词生成接收到的语言的假设和/或提供指示语言的输出信号以及其中包含的至少一些语义或句法概念。

9.

发明授权
Information retrieval and speech recognition based on language models 失效
标题翻译：基于语言模型的信息检索和语音识别

公开(公告)号：US06418431B1

公开(公告)日：2002-07-09

申请号：US09050286

申请日：1998-03-30

申请人： Milind V. Mahajan , Xuedong D. Huang

发明人： Milind V. Mahajan , Xuedong D. Huang

IPC分类号： G06F1730

CPC分类号： G06F17/30687 , G10L15/183 , G10L15/197 , Y10S707/99934

摘要： A language model is used in a speech recognition system which has access to a first, smaller data store and a second, larger data store. The language model is adapted by formulating an information retrieval query based on information contained in the first data store and querying the second data store. Information retrieved from the second data store is used in adapting the language model. Also, language models are used in retrieving information from the second data store. Language models are built based on information in the first data store, and based on information in the second data store. The perplexity of a document in the second data store is determined, given the first language model, and given the second language model. Relevancy of the document is determined based upon the first and second perplexities. Documents are retrieved which have a relevancy measure that exceeds a threshold level.

摘要翻译： 一种语言模型用于能够访问第一个较小的数据存储和第二个更大数据存储的语音识别系统。通过基于包含在第一数据存储器中的信息并查询第二数据存储器来制定信息检索查询来适应语言模型。从第二数据存储器检索的信息用于适应语言模型。此外，语言模型用于从第二数据存储检索信息。语言模型是基于第一数据存储中的信息构建的，并且基于第二数据存储中的信息。在给定第一语言模型并给出第二语言模型的情况下，确定第二数据存储中的文档的困惑度。基于第一和第二困惑来确定文档的相关性。检索具有超过阈值水平的相关性度量的文档。

10.

发明授权
Extensible speech recognition system that provides a user with audio feedback 失效
标题翻译：可扩展语音识别系统，为用户提供音频反馈

公开(公告)号：US5933804A

公开(公告)日：1999-08-03

申请号：US833916

申请日：1997-04-10

申请人： Xuedong D. Huang , Michael J. Rozak , Li Jiang

发明人： Xuedong D. Huang , Michael J. Rozak , Li Jiang

IPC分类号： G06F3/16 , G10L13/00 , G10L15/06 , G10L15/22 , G10L15/28 , G10L5/06 , G10L5/02

CPC分类号： G10L15/063 , G10L2015/0638

摘要： A speech recognition system is extensible in that new terms may be added to a list of terms that are recognized by the speech recognition system. The speech recognition system provides audio feedback when new terms are added so that a user may hear how the system expects the word to be pronounced. The user may then accept the pronunciation or provide his own pronunciation. The user may also selectively change the pronunciation of words to avoid misrecognitions by the system. The system may provide appropriate user interface elements for enabling a user to change the pronunciation of words. The system may also include intelligence for automatically changing the pronunciation of words used in recognition based upon empirically derived information.

摘要翻译： 语音识别系统是可扩展的，因为可以将新术语添加到由语音识别系统识别的术语列表中。当添加新术语时，语音识别系统提供音频反馈，使得用户可以听到系统如何预期该单词被发音。用户可以接受发音或提供自己的发音。用户还可以选择性地改变单词的发音，以避免系统误认识。系统可以提供适当的用户界面元素，以使用户能够改变单词的发音。该系统还可以包括基于经验导出的信息来自动改变识别中使用的单词的发音的智能。

搜索结果

国家/区域

专利有效性

申请日

公布(公告)日

申请人

申请人所在国/区域

发明人

IPC

IPC部

IPC大类

IPC小类

IPC大组

IPC小组

外观分类