METHOD AND APPARATUS FOR DETECTING AFFECTS IN SPEECH
    11.
    发明申请
    METHOD AND APPARATUS FOR DETECTING AFFECTS IN SPEECH 审中-公开
    用于检测语音影响的方法和装置

    公开(公告)号:US20070192097A1

    公开(公告)日:2007-08-16

    申请号:US11275350

    申请日:2006-02-14

    IPC分类号: G10L15/00

    CPC分类号: G10L25/48

    摘要: A method and apparatus for speaker independent real-time affect detection includes generating (205) a sequence of audio frames from a segment of speech, generating (210) a sequence of feature sets by generating a feature set for each frame, and applying (215) the sequence of feature sets to a sequential classifier to determine a most likely affect expressed in the segment of speech.

    摘要翻译: 一种用于独立于扬声器的实时影响检测的方法和装置,包括从语音段产生(205)音频帧序列(205),生成(210)特征集序列,生成每个帧的特征集,并应用 )特征集的序列到顺序分类器以确定在语音段中表达的最可能的影响。

    Tailored speaker-independent voice recognition system
    12.
    发明申请
    Tailored speaker-independent voice recognition system 有权
    量身定制的与扬声器无关的语音识别系统

    公开(公告)号:US20060085186A1

    公开(公告)日:2006-04-20

    申请号:US10967957

    申请日:2004-10-19

    申请人: Changxue Ma Yan Cheng

    发明人: Changxue Ma Yan Cheng

    IPC分类号: G10L15/08

    CPC分类号: G10L15/063 G10L2015/0631

    摘要: A tailored speaker-independent voice recognition system has a speech recognition dictionary (360) with at least one word (371). That word (371) has at least two transcriptions (373), each transcription (373) having a probability factor (375) and an indicator (377) of whether the transcription is active. When a speech utterance is received (510), the voice recognition system determines (520, 530) the word signified by the speech utterance, evaluates (540) the speech utterance against the transcriptions of the correct word, updates (550) the probability factors for each transcription, and inactivates (570) any transcription that has an updated probability factor that is less than a threshold.

    摘要翻译: 定制的与扬声器无关的语音识别系统具有至少一个单词(371)的语音识别词典(360)。 该字(371)具有至少两个转录(373),每个转录(373)具有概率因子(375)和指示符(377)是否转录是活性的。 当接收到语音话语(510)时,语音识别系统确定(520,530)由语音发音表示的单词,根据正确单词的转录评估(540)语音发音,更新(550)概率因子 对于每个转录,并使(570)任何具有小于阈值的更新概率因子的转录失活。

    Method and apparatus to facilitate correlating symbols to sounds
    13.
    发明授权
    Method and apparatus to facilitate correlating symbols to sounds 有权
    便于将符号与声音相关联的方法和装置

    公开(公告)号:US06999918B2

    公开(公告)日:2006-02-14

    申请号:US10251354

    申请日:2002-09-20

    IPC分类号: G10L13/00 G06F17/21

    CPC分类号: G10L13/08

    摘要: A dictionary is comprised of a dendroid hierarchy of branches and nodes, wherein each node represents no more than one symbol (which symbol is to be converted to a corresponding sound) and wherein each such symbol as is represented at a given node has only one corresponding sound associated with that symbol at that node. In addition, many of the branches include a plurality of nodes representing a string of the symbols in a particular sequence. The dictionary is used to translate an input comprising a given integral sequence of the symbols into a corresponding integral sequence of sounds. This permits both method and apparatus to convert, for example, text to representative phonemes. Such phonemes can be used, amongst other purposes, to support synthesized speech production.

    摘要翻译: 字典由分支和节点的树状分层组成,其中每个节点表示不超过一个符号(该符号将被转换为对应的声音),并且其中在给定节点处表示的每个这样的符号只有一个对应 在该节点处与该符号相关联的声音。 此外,许多分支包括表示特定序列中的符号串的多个节点。 字典用于将包括符号的给定整数序列的输入转换成相应的整体声音序列。 这允许方法和装置将例如文本转换为代表性音素。 除了其它目的之外,还可以使用这样的音素来支持合成语音制作。

    Method and apparatus for voice searching for stored content using uniterm discovery
    14.
    发明授权
    Method and apparatus for voice searching for stored content using uniterm discovery 有权
    使用uniterm发现语音搜索存储内容的方法和装置

    公开(公告)号:US08015005B2

    公开(公告)日:2011-09-06

    申请号:US12032258

    申请日:2008-02-15

    申请人: Changxue Ma

    发明人: Changxue Ma

    IPC分类号: G10L15/00

    摘要: A method, system and communication device for enabling voice-to-voice searching and ordered content retrieval via audio tags assigned to individual content, which tags generate uniterms that are matched against components of a voice query. The method includes storing content and tagging at least one of the content with an audio tag. The method further includes receiving a voice query to retrieve content stored on the device. When the voice query is received, the method completes a voice-to-voice search utilizing uniterms of the audio tag, scored against the phoneme latent lattice model generated by the voice query to identify matching terms within the audio tags and corresponding stored content. The retrieved content(s) associated with the identified audio tags having uniterms that score within the phoneme lattice model are outputted in an order corresponding to an order in which the uniterms are structured within the voice query.

    摘要翻译: 一种用于通过分配给各个内容的音频标签启用语音到语音搜索和排序内容检索的方法,系统和通信设备,该标签生成与语音查询的组件匹配的单位。 该方法包括存储内容并且将具有音频标签的内容中的至少一个标记。 该方法还包括接收语音查询以检索存储在设备上的内容。 当接收到语音查询时,该方法使用音频标签的单位完成语音到语音搜索,对由语音查询产生的音素潜在网格模型进行评分,以识别音频标签内的匹配项和对应的存储内容。 与所识别的音频标签相关联的检索到的内容,其具有在音素格子模型内得分的单位格式,其顺序与语音查询内的单元格结构的顺序相对应地输出。

    METHOD AND APPARATUS FOR BEST MATCHING AN AUDIBLE QUERY TO A SET OF AUDIBLE TARGETS
    15.
    发明申请
    METHOD AND APPARATUS FOR BEST MATCHING AN AUDIBLE QUERY TO A SET OF AUDIBLE TARGETS 有权
    最佳匹配方法和设备可以将一组可视目标进行可视查询

    公开(公告)号:US20110154977A1

    公开(公告)日:2011-06-30

    申请号:US12649458

    申请日:2009-12-30

    IPC分类号: G10H7/00

    摘要: During operation, a “coarse search” stage applies variable-scale windowing on the query pitch contours to compare them with fixed-length segments of target pitch contours to find matching candidates while efficiently scanning over variable tempo differences and target locations. Because the target segments are of fixed-length, this has the effect of drastically reducing the storage space required in a prior-art method. Furthermore, by breaking the query contours into parts, rhythmic inconsistencies can be more flexibly handled. Normalization is also applied to the contours to allow comparisons independent of differences in musical key. In a “fine search” stage, a “segmental” dynamic time warping (DTW) method is applied that calculates a more accurate similarity score between the query and each candidate target with more explicit consideration toward rhythmic inconsistencies.

    摘要翻译: 在操作期间,“粗略搜索”阶段在查询音调轮廓上应用可变尺度窗口,以将其与目标俯仰轮廓的固定长度段进行比较,以在有效扫描可变速度差异和目标位置的同时找到匹配候选。 因为目标段是固定长度的,所以这具有显着减少现有方法所需的存储空间的效果。 此外,通过将查询轮廓分解成部分,可以更灵活地处理节奏不一致。 归一化也适用于轮廓,以便独立于音乐键的差异进行比较。 在“精细搜索”阶段,应用“分段”动态时间扭曲(DTW)方法,通过更明确地考虑节奏不一致来计算查询和每个候选目标之间的更准确的相似性分数。

    VOICE WEB SEARCH
    16.
    发明申请
    VOICE WEB SEARCH 有权
    语音网页搜索

    公开(公告)号:US20110145214A1

    公开(公告)日:2011-06-16

    申请号:US12639176

    申请日:2009-12-16

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30899 G06F17/30654

    摘要: A search system will receive a voice query and use speech recognition with a predefined vocabulary to generate a textual transcription of the voice query. Queries are sent to a text search engine, retrieving multiple web page results for each of these initial text queries. The collection of the keywords is extracted from the resulting web pages and is phonetically indexed to form a voice query dependent and phonetically searchable index database. Finally, a phonetically-based voice search engine is used to search the original voice query against the voice query dependent and phonetically searchable index database to find the keywords and/or key phrases that best match what was originally spoken. The keywords and/or key phrases that best match what was originally spoken are then used as a final text query for a search engine. Search results from the final text query are then presented to the user.

    摘要翻译: 搜索系统将接收语音查询并且使用具有预定义词汇表的语音识别来生成语音查询的文本转录。 查询被发送到文本搜索引擎,为每个这些初始文本查询检索多个网页结果。 从所得到的网页中提取关键字的集合,并进行语音索引,形成一个语音查询依赖和语音可搜索的索引数据库。 最后,使用基于语音的语音搜索引擎来针对语音查询依赖和语音搜索的索引数据库搜索原始语音查询,以找到与最初所说的最匹配的关键字和/或关键短语。 最符合最初发言的关键字和/或关键短语然后被用作搜索引擎的最终文本查询。 然后将最终文本查询的搜索结果呈现给用户。

    ANALYZING AND PROCESSING A VERBAL EXPRESSION CONTAINING MULTIPLE GOALS
    17.
    发明申请
    ANALYZING AND PROCESSING A VERBAL EXPRESSION CONTAINING MULTIPLE GOALS 有权
    分析和处理包含多个目标的VERBAL表达

    公开(公告)号:US20110144996A1

    公开(公告)日:2011-06-16

    申请号:US12639067

    申请日:2009-12-16

    IPC分类号: G10L15/00

    摘要: Disclosed is a method for parsing a verbal expression received from a user to determine whether or not the expression contains a multiple-goal command. Specifically, known techniques are applied to extract terms from the verbal expression. The extracted terms are assigned to categories. If two or more terms are found in the parsed verbal expression that are in associated categories and that do not overlap one another temporally, then the confidence levels of these terms are compared. If the confidence levels are similar, then the terms may be parallel entries in the verbal expression and may represent multiple goals. If a multiple-goal command is found, then the command is either presented to the user for review and possible editing or is executed. If the parsed multiple-goal command is presented to the user for review, then the presentation can be made via any appropriate interface including voice and text interfaces.

    摘要翻译: 公开了一种用于解析从用户接收的口头表达以确定表达式是否包含多目标命令的方法。 具体来说,应用已知技术从语言表达中提取术语。 提取的术语被分配到类别。 如果在解析的语言表达中找到两个或多个相关类别的术语,并且不会在时间上彼此重叠,那么比较这些术语的置信水平。 如果置信水平相似,则术语可能是口头表达中的并行条目,可能表示多个目标。 如果找到多目标命令,则将该命令呈现给用户进行审查和可能的编辑或执行。 如果将解析的多目标命令呈现给用户进行审查,则可以通过任何适当的界面(包括语音和文本界面)进行演示。

    PROGRESSIVELY REFINING A SPEECH-BASED SEARCH
    18.
    发明申请
    PROGRESSIVELY REFINING A SPEECH-BASED SEARCH 审中-公开
    逐步精简基于语音的搜索

    公开(公告)号:US20100153112A1

    公开(公告)日:2010-06-17

    申请号:US12335840

    申请日:2008-12-16

    IPC分类号: G06F17/30 G10L15/18 G10L15/00

    摘要: Disclosed are editing methods that are added to speech-based searching to allow users to better understand textual queries submitted to a search engine and to easily edit their speech queries. According to some embodiments, the user begins to speak. The user's speech is translated into a textual query and submitted to a search engine. The results of the search are presented to the user. As the user continues to speak, the user's speech query is refined based on the user's further speech. The refined speech query is converted to a textual query which is again submitted to the search engine. The refined results are presented to the user. This process continues as long as the user continues to refine the query. Some embodiments present the textual query to the user and allow the user to use both speech-based and non-speech-based tools to edit the textual query.

    摘要翻译: 公开的是编辑方法,被添加到基于语音的搜索中,以允许用户更好地理解提交到搜索引擎的文本查询并且容易地编辑他们的语音查询。 根据一些实施例,用户开始说话。 用户的语音被翻译成文本查询并提交给搜索引擎。 将搜索结果呈现给用户。 随着用户继续说话,基于用户的进一步的语音来改进用户的语音查询。 精致的语音查询被转换为文本查询,再次提交给搜索引擎。 精细的结果呈现给用户。 只要用户继续细化查询,该过程就会继续。 一些实施例将文本查询呈现给用户,并允许用户使用基于语音和非基于语音的工具来编辑文本查询。

    Method and apparatus for generating a voice tag
    19.
    发明申请
    Method and apparatus for generating a voice tag 审中-公开
    用于生成语音标签的方法和装置

    公开(公告)号:US20060287867A1

    公开(公告)日:2006-12-21

    申请号:US11155944

    申请日:2005-06-17

    申请人: Yan Cheng Changxue Ma

    发明人: Yan Cheng Changxue Ma

    IPC分类号: G10L21/00

    摘要: A method and apparatus for generating a voice tag (140) includes a means (110) for combining (205) a plurality of utterances (106, 107, 108) into a combined utterance (111) and a means (120) for extraction (210) of the voice tag as a sequence of phonemes having a high likelihood of representing the combined utterance, using a set of stored phonemes (115) and the combined utterance.

    摘要翻译: 一种用于生成语音标签(140)的方法和装置包括:用于将多个话语(106,107,108)组合(205)到组合话语(111)中的装置(110)和用于提取的装置(120) 210)作为具有表示组合发音的高可能性的音素序列,使用一组存储的音素(115)和组合的话语。

    Method and system for interpreting verbal inputs in multimodal dialog system
    20.
    发明申请
    Method and system for interpreting verbal inputs in multimodal dialog system 有权
    在多模态对话系统中解释口头输入的方法和系统

    公开(公告)号:US20060229862A1

    公开(公告)日:2006-10-12

    申请号:US11100185

    申请日:2005-04-06

    IPC分类号: G06F17/28

    摘要: A method, a system and a computer program product for interpreting a verbal input in a multimodal dialog system are provided. The method includes assigning (302) a confidence value to at least one word generated by a verbal recognition component. The method further includes generating (304) a semantic unit confidence score for the verbal input. The generation of a semantic unit confidence score is based on the confidence value of at least one word and at least one semantic confidence operator.

    摘要翻译: 提供了一种用于在多模式对话系统中解释口头输入的方法,系统和计算机程序产品。 该方法包括将置信度值(302)分配(302)至由语言识别组件生成的至少一个词。 该方法还包括为语言输入生成(304)语义单位置信度得分。 语义单位置信度得分的产生基于至少一个单词和至少一个语义置信度运算符的置信度值。