-
公开(公告)号:US20050027523A1
公开(公告)日:2005-02-03
申请号:US10631256
申请日:2003-07-31
申请人: Prakairut Tarlton , Janet Cahn , Changxue Ma
发明人: Prakairut Tarlton , Janet Cahn , Changxue Ma
CPC分类号: G10L15/22
摘要: A spoken language system (100) includes a recognition component (120) that generates (220) a recognized sequence of words from a sequence of received spoken words, and assigns (225) a confidence score to each word in the recognized sequence of words. A presentation component (140) of the spoken language system adjusts (240) nominal acoustical properties of words in a presentation (142) of the recognized sequence of words, the adjustment performed according to the confidence score of each word. The adjustments include adjustments to acoustical features and acoustical contexts of words and groups of words in the presented sequence of words. The presentation component presents (245) the adjusted sequence of words.
摘要翻译: 口语系统(100)包括识别组件(120),其从接收到的语音单词的序列中生成(220)识别的单词序列,并且将所识别的单词序列中的每个单词赋予(225)置信度分数(225)。 口语系统的呈现组件(140)根据所识别的单词序列的呈现(142)来调整(240)单词的标称声学特性,根据每个单词的置信度进行调整。 调整包括对所提出的单词序列中的单词和单词组的声学特征和声学上下文的调整。 演示组件呈现(245)调整的单词序列。
-
公开(公告)号:US09081868B2
公开(公告)日:2015-07-14
申请号:US12639176
申请日:2009-12-16
申请人: Fan Zhang , Yan-Ming Cheng , Changxue Ma , James R. Talley
发明人: Fan Zhang , Yan-Ming Cheng , Changxue Ma , James R. Talley
CPC分类号: G06F17/30899 , G06F17/30654
摘要: A search system will receive a voice query and use speech recognition with a predefined vocabulary to generate a textual transcription of the voice query. Queries are sent to a text search engine, retrieving multiple web page results for each of these initial text queries. The collection of the keywords is extracted from the resulting web pages and is phonetically indexed to form a voice query dependent and phonetically searchable index database. Finally, a phonetically-based voice search engine is used to search the original voice query against the voice query dependent and phonetically searchable index database to find the keywords and/or key phrases that best match what was originally spoken. The keywords and/or key phrases that best match what was originally spoken are then used as a final text query for a search engine. Search results from the final text query are then presented to the user.
摘要翻译: 搜索系统将接收语音查询并且使用具有预定义词汇表的语音识别来生成语音查询的文本转录。 查询被发送到文本搜索引擎,为每个这些初始文本查询检索多个网页结果。 从所得到的网页中提取关键字的集合,并进行语音索引,形成一个语音查询依赖和语音可搜索的索引数据库。 最后,使用基于语音的语音搜索引擎来针对语音查询依赖和语音搜索索引数据库搜索原始语音查询,以找到与最初所说的最匹配的关键字和/或关键短语。 最符合最初发言的关键字和/或关键短语然后被用作搜索引擎的最终文本查询。 然后将最终文本查询的搜索结果呈现给用户。
-
33.
公开(公告)号:US08914289B2
公开(公告)日:2014-12-16
申请号:US12639067
申请日:2009-12-16
申请人: Changxue Ma , Yan-Ming Cheng
发明人: Changxue Ma , Yan-Ming Cheng
CPC分类号: G10L15/22 , G06F17/2765 , G06F17/30637 , G10L2015/223
摘要: A method for parsing a verbal expression received from a user to determine whether or not the expression contains a multiple-goal command is described. Specifically, known techniques are applied to extract terms from the verbal expression. The extracted terms are assigned to categories. If two or more terms are found in the parsed verbal expression that are in associated categories and that do not overlap one another temporally, then the confidence levels of these terms are compared. If the confidence levels are similar, then the terms may be parallel entries in the verbal expression and may represent multiple goals. If a multiple-goal command is found, then the command is either presented to the user for review and possible editing or is executed. If the parsed multiple-goal command is presented to the user for review, then the presentation can be made via any appropriate interface including voice and text interfaces.
摘要翻译: 描述用于解析从用户接收的语言表达以确定表达式是否包含多目标命令的方法。 具体来说,应用已知技术从语言表达中提取术语。 提取的术语被分配到类别。 如果在解析的语言表达中找到两个或多个相关类别的术语,并且不会在时间上彼此重叠,那么比较这些术语的置信水平。 如果置信水平相似,则术语可能是口头表达中的并行条目,可能表示多个目标。 如果找到多目标命令,则将该命令呈现给用户进行审查和可能的编辑或执行。 如果将解析的多目标命令呈现给用户进行审查,则可以通过任何适当的界面(包括语音和文本界面)进行演示。
-
公开(公告)号:US08442823B2
公开(公告)日:2013-05-14
申请号:US12907729
申请日:2010-10-19
申请人: Woojay Jeon , Yan-Ming Cheng , Changxue Ma , Dusan Macho
发明人: Woojay Jeon , Yan-Ming Cheng , Changxue Ma , Dusan Macho
IPC分类号: G10L15/00
CPC分类号: G10L15/1822
摘要: A method of performing a search of a database of speakers, includes: receiving a query speech sample spoken by a query speaker; deriving a query utterance from the query speech sample; extracting query utterance statistics from the query utterance; performing Kernelized Locality-Sensitive Hashing (KLSH) using a kernel function, the KLSH using as input the query utterance statistics and utterance statistics extracted from a plurality of utterances included in a database of speakers in order to select a subset of the plurality of utterances; and comparing, using an utterance comparison equation, the query utterance statistics to the utterance statistics for each utterance in the subset to generate a list of speakers from the database of utterances having a highest similarity to the query speaker.
摘要翻译: 一种执行对扬声器数据库的搜索的方法,包括:接收由查询扬声器所说出的查询语音样本; 从查询语音样本中导出查询语句; 从查询语句中提取查询语句统计信息; 使用核函数执行内核局部敏感哈希(KLSH),所述KLSH使用包括在扬声器数据库中的多个话语中提取的查询话语统计和话音统计作为输入,以便选择所述多个话语的子集; 以及使用话语比较方程比较所述子集中每个话语的话语统计量的查询话语统计量,以从所述数据库中产生具有与所述查询发音者具有最高相似性的话语的说话者列表。
-
公开(公告)号:US20100137030A1
公开(公告)日:2010-06-03
申请号:US12326475
申请日:2008-12-02
申请人: Changxue Ma
发明人: Changxue Ma
CPC分类号: H04M1/72522 , G10L13/00
摘要: Disclosed is a technique for presenting audible items to a user in a manner that allows the user to easily distinguish them and to select from among them. A number of audible items are rendered simultaneously to the user. To prevent the sounds from blending together into a sonic mishmash, some of the items are “conditioned” while they are being rendered. For example, one audible item might be rendered more quietly than another, or one item can be moved up in register compared with another. Some embodiments combine audible conditioning with visual avatars portrayed on, for example, a display screen of a user device. During the rendering, each audible item is paired with an avatar, the pairing based on some suitable criterion, such as a type of conditioning applied to the audible item. Audible spatial placement is mimicked by visual placement of the avatars on the user's display screen.
摘要翻译: 公开了一种用于以允许用户容易地区分它们并从中选择的方式向用户呈现可听物品的技术。 许多可听的项目同时呈现给用户。 为了防止声音混合在一起,声音混乱,有些项目在被渲染时被“调节”。 例如,一个可听的项目可能比另一个更安静地呈现,或者一个项目可以向上移动到与另一个相对应的寄存器中。 一些实施例将声音调节与例如用户设备的显示屏上描绘的视觉化身相结合。 在渲染期间,每个可听见的项目与化身配对,基于一些合适的标准,例如应用于可听项目的调节类型的配对。 可视空间放置由用户显示屏上的化身的可视放置模拟。
-
公开(公告)号:US20090259469A1
公开(公告)日:2009-10-15
申请号:US12102141
申请日:2008-04-14
申请人: Changxue Ma , Yuan-Jun Wei
发明人: Changxue Ma , Yuan-Jun Wei
CPC分类号: G10L15/02 , G10L15/142
摘要: A method and apparatus for performing speech recognition receives an audio signal, generates a sequence of frames of the audio signal, transforms each frame of the audio signal into a set of narrow band feature vectors using a narrow passband, couples the narrow band feature vectors to a speech model, and determines whether the audio signal is a wide band signal. When the audio signal is determined to be a wide band signal, a pass band parameter of each of one or more passbands that are outside the narrow passband is generated for each frame and the one or more band energy parameters are coupled to the speech model.
摘要翻译: 用于执行语音识别的方法和装置接收音频信号,产生音频信号的一系列帧,使用窄通带将音频信号的每一帧转换成一组窄带特征向量,将窄带特征向量耦合到 语音模型,并且确定音频信号是否是宽带信号。 当音频信号被确定为宽带信号时,针对每个帧产生在窄通带外部的一个或多个通带中的每一个的通带参数,并且一个或多个频带能量参数耦合到语音模型。
-
37.
公开(公告)号:US07299173B2
公开(公告)日:2007-11-20
申请号:US10060511
申请日:2002-01-30
申请人: Changxue Ma , Mark Randolph
发明人: Changxue Ma , Mark Randolph
IPC分类号: G10L21/02
摘要: Speech presence is detected by first bandpass filtering (141, 143, 145) the speech to split it into banks of sub-bands. A matrix of shift registers (150) store each sub-band of speech. A power determining circuit (259) then determines individual power measurements of the speech stored in each shift register element. A variance combining circuit (160) combines the individual power measurements to provide a variance for the individual shift registers. A comparator circuit (170) finally compares the variance with at least one threshold to indicate whether speech is detected.
摘要翻译: 通过第一次带通滤波(141,143,145)来检测语音存在,将其分解成子带段。 移位寄存器(150)的矩阵存储每个语音子带。 功率确定电路(259)然后确定存储在每个移位寄存器元件中的语音的各个功率测量值。 方差组合电路(160)组合各个功率测量以提供各个移位寄存器的方差。 比较器电路(170)最终将方差与至少一个阈值相比较,以指示是否检测到语音。
-
公开(公告)号:US20070239444A1
公开(公告)日:2007-10-11
申请号:US11277793
申请日:2006-03-29
申请人: Changxue Ma
发明人: Changxue Ma
IPC分类号: G10L15/20
CPC分类号: G10L15/20 , G10L21/0216 , G10L2015/025
摘要: A system (100) and method (200) for generating a perturbed phonetic string for use in speech recognition. The method can include generating (202) a feature vector set from a spoken utterance, applying (204) a perturbation to the feature vector set for producing a perturbed feature vector set, and phonetically decoding (206) the perturbed feature vector set for producing a perturbed phonetic string. The perturbation mimics environmental variability and speaker variability for reducing the number of spoken utterances in speech recognition applications.
摘要翻译: 一种用于产生用于语音识别的扰动语音串的系统(100)和方法(200)。 该方法可以包括从语音发音生成(202)特征向量集合,将扰动应用(204)扰动到特征向量集合以产生扰动的特征向量集合,并且语音地解码(206)扰动的特征向量集合以产生 扰乱的语音字符串 扰动模拟环境变异性和说话者变异性,以减少语音识别应用中的口语话语数量。
-
公开(公告)号:US20070129945A1
公开(公告)日:2007-06-07
申请号:US11294959
申请日:2005-12-06
申请人: Changxue Ma , Yan Cheng , Steven Nowlan , Tenkasi Ramabadran
发明人: Changxue Ma , Yan Cheng , Steven Nowlan , Tenkasi Ramabadran
IPC分类号: G10L15/04
摘要: A method and apparatus are provided for reproducing a speech sequence of a user through a communication device of the user. The method includes the steps of detecting a speech sequence from the user through the communication device, recognizing a phoneme sequence within the detected speech sequence and forming a confidence level of each phoneme within the recognized phoneme sequence. The method further includes the steps of audibly reproducing the recognized phoneme sequence for the user through the communication device and gradually highlighting or degrading a voice quality of at least some phonemes of the recognized phoneme sequence based upon the formed confidence level of the at least some phonemes.
摘要翻译: 提供了一种用于通过用户的通信设备再现用户的语音序列的方法和装置。 该方法包括以下步骤:通过通信设备检测来自用户的语音序列,识别检测到的语音序列内的音素序列,并形成识别的音素序列内每个音素的置信度。 该方法还包括以下步骤:通过通信设备可听地再现用户的识别音素序列,并基于所形成的至少一些音素的置信水平逐渐突出或降低所识别的音素序列的至少一些音素的语音质量 。
-
公开(公告)号:US20070106506A1
公开(公告)日:2007-05-10
申请号:US11268113
申请日:2005-11-07
申请人: Changxue Ma , Ted Mazurkiewicz
发明人: Changxue Ma , Ted Mazurkiewicz
IPC分类号: G10L15/00
CPC分类号: H04M1/271 , G10L15/26 , H04M1/274583 , H04M2250/74
摘要: A method and apparatus is provided for identifying an input sequence entered by a user of a communication unit. The method includes the steps of providing a database containing a plurality of partial sequences from the user of the communication unit, recognizing an identity of at least some information items of the input sequence entered by the user, comparing the recognized sequence of information items with the plurality of partial sequences within the database and selecting a partial sequence of the plurality of sequences within the database with a closest relative match to the recognized sequence as the input sequence intended by the user.
摘要翻译: 提供了一种用于识别由通信单元的用户输入的输入序列的方法和装置。 该方法包括以下步骤:从通信单元的用户提供包含多个部分序列的数据库,识别用户输入的输入序列的至少一些信息项的标识,将识别的信息项序列与 数据库内的多个部分序列,并且以数据库中的多个序列的部分序列与所识别的序列的最接近的相对匹配来选择用户所期望的输入序列。
-
-
-
-
-
-
-
-
-