Method for goal-oriented speech translation in hand-held devices using meaning extraction and dialogue
    31.
    发明授权
    Method for goal-oriented speech translation in hand-held devices using meaning extraction and dialogue 有权
    使用意义提取和对话的手持设备中面向目标的语音翻译方法

    公开(公告)号:US06233561B1

    公开(公告)日:2001-05-15

    申请号:US09290628

    申请日:1999-04-12

    IPC分类号: G10L1522

    CPC分类号: G10L15/1822 G10L15/1815

    摘要: A computer-implemented method and apparatus is provided for processing a spoken request from a user. A speech recognizer converts the spoken request into a digital format. A frame data structure associates semantic components of the digitized spoken request with predetermined slots. The slots are indicative of data which are used to achieve a predetermined goal. A speech understanding module which is connected to the speech recognizer and to the frame data structure determines semantic components of the spoken request. The slots are populated based upon the determined semantic components. A dialog manager which is connected to the speech understanding module may determine at least one slot which is unpopulated based upon the determined semantic components and in a preferred embodiment may provide confirmation of the populated slots. A computer generated-request is formulated in order for the user to provide data related to the unpopulated slot. The method and apparatus are well-suited (but not limited) to use in a hand-held speech translation device.

    摘要翻译: 提供了一种用于处理来自用户的口头请求的计算机实现的方法和装置。 语音识别器将口头请求转换为数字格式。 帧数据结构将数字化语音请求的语义分量与预定时隙相关联。 这些时隙指示用于实现预定目标的数据。 连接到语音识别器和帧数据结构的语音理解模块确定语音请求的语义分量。 基于确定的语义分量来填充时隙。 连接到语音理解模块的对话管理器可以基于所确定的语义组件来确定未填充的至少一个时隙,并且在优选实施例中可以提供填充时隙的确认。 制定计算机生成请求以便用户提供与未填充槽相关的数据。 该方法和装置非常适合(但不限于)在手持语音翻译装置中使用。

    Method for generating spelling-to-pronunciation decision tree
    32.
    发明授权
    Method for generating spelling-to-pronunciation decision tree 失效
    拼写到发音决策树的方法

    公开(公告)号:US06230131B1

    公开(公告)日:2001-05-08

    申请号:US09069308

    申请日:1998-04-29

    IPC分类号: G10L1308

    CPC分类号: G10L13/08

    摘要: Decision trees are used to store a series of yes-no questions that can be used to convert spelled-word letter sequences into pronunciations. Letter-only trees, having internal nodes populated with questions about letters in the input sequence, generate one or more pronunciations based on probability data stored in the leaf nodes of the tree. The pronunciations may then be improved by processing them using mixed trees which are populated with questions about letters in the sequence and also questions about phonemes associated with those letters. The mixed tree screens out pronunciations that would not occur in natural speech, thereby greatly improving the results of the letter-to-pronunciation transformation.

    摘要翻译: 决策树用于存储可用于将拼写字母序列转换为发音的一系列“是”的问题。 仅有信息树,内部节点填充有关输入序列中的字母的问题,根据存储在树的叶节点中的概率数据生成一个或多个发音。 然后可以通过使用填充有序列中的字母的问题的混合树以及与这些字母相关的音素的问题来处理它们来发音。 混合树屏蔽了自然语言中不会发生的发音,从而大大提高了字母到发音转换的结果。

    Speech recognition system employing multiple grammar networks
    33.
    发明授权
    Speech recognition system employing multiple grammar networks 失效
    语音识别系统采用多种语法网络

    公开(公告)号:US5991720A

    公开(公告)日:1999-11-23

    申请号:US834358

    申请日:1997-04-16

    摘要: The input speech is segmented using plural grammar networks, including a network that includes a filler model designed to represent noise or extraneous speech. Recognition processing results in plural lists of candidates, each list containing the N-best candidates generated. The lists are then separately aligned with the dictionary of valid names to generate two lists of valid names. The final recognition pass combines these two lists of names into a dynamic grammar and this dynamic grammar may be used to find the best candidate name using Viterbi recognition. A telephone call routing application based on the recognition system selects the best candidate name corresponding to the name spelled by the user, whether the user pronounces the name prior to spelling, or not.

    摘要翻译: 使用多个语法网络对输入语音进行分段,包括一个网络,其中包括一个设计用于表示噪声或无关语音的填充模型。 识别处理产生多个候选列表,每个列表包含生成的N个最佳候选。 然后将列表与有效名称的字典分开对齐,以生成两个有效名称列表。 最终识别通过将这两个名称列表组合成动态语法,并且可以使用该动态语法来使用维特比识别来找到最佳候选名。 基于识别系统的电话呼叫路由应用选择与用户拼写的名称相对应的最佳候选名称,用户是否在拼写之前发音名称。

    Voice dialing server for branch exchange telephone systems
    34.
    发明授权
    Voice dialing server for branch exchange telephone systems 失效
    分机交换电话系统的语音拨号服务器

    公开(公告)号:US5930336A

    公开(公告)日:1999-07-27

    申请号:US723914

    申请日:1996-09-30

    摘要: The voice dialing server plugs into one or more unused extensions of a branch exchange system to provide each of the users on the system with voice dialing services. To use the system a user simply dials the extension to which the server is attached. The server then prompts the user to supply the name of a party to be called. The name is then looked up in a telephone number dictionary unique to that user. The system then places the telephone call by sending commands to the branch exchange system that simulate the operations a user would perform to connect to an outside line or inside extension and then place the call. The server incorporates a speech processing module having a multistage word recognizer that represents speech in terms of high phoneme similarity values. This representation is highly compact, allowing the word recognizer to perform the recognizer and fine match stages with far less processor overhead than frame-by-frame speech recognizers.

    摘要翻译: 语音拨号服务器插入分支交换系统的一个或多个未使用的分机,以向系统中的每个用户提供语音拨号服务。 要使用系统,用户只需拨打服务器所连接的扩展名。 服务器然后提示用户提供被叫方的名称。 然后在该用户唯一的电话号码字典中查找该名称。 然后,该系统通过发送命令发送电话给分支交换系统,该系统模拟用户将执行的连接到外线或内部分机的操作,然后进行呼叫。 该服务器包括具有多字词识别器的语音处理模块,其以高音素相似度值表示语音。 该表示非常紧凑,允许字识别器执行识别器和精细匹配阶段,而且比逐帧语音识别器远远少于处理器开销。

    Multilingual text-to-speech system with limited resources
    35.
    发明授权
    Multilingual text-to-speech system with limited resources 有权
    具有有限资源的多语言文字到语音系统

    公开(公告)号:US07596499B2

    公开(公告)日:2009-09-29

    申请号:US10771256

    申请日:2004-02-02

    IPC分类号: G10L21/00

    CPC分类号: G10L13/08

    摘要: A multilingual text-to-speech system includes a source datastore of primary source parameters providing information about a speaker of a primary language. A plurality of primary filter parameters provides information about sounds in the primary language. A plurality of secondary filter parameters provides information about sounds in a secondary language. One or more secondary filter parameters is normalized to the primary filter parameters and mapped to a primary source parameter.

    摘要翻译: 多语言文本到语音系统包括提供关于主要语言的说话者的信息的主要源参数的源数据库。 多个主要滤波器参数提供关于主要语言的声音的信息。 多个次级滤波器参数提供关于次要语言的声音的信息。 将一个或多个二次滤波器参数归一化为主滤波器参数并映射到主要源参数。

    Speaker authentication by fusion of voiceprint match attempt results with additional information
    36.
    发明授权
    Speaker authentication by fusion of voiceprint match attempt results with additional information 有权
    通过融合声纹的扬声器认证与尝试结果匹配附加信息

    公开(公告)号:US07240007B2

    公开(公告)日:2007-07-03

    申请号:US10392549

    申请日:2003-03-20

    IPC分类号: G10L17/00

    CPC分类号: G10L15/24

    摘要: A speaker authentication system includes a data fuser operable to fuse voiceprint match attempt results with additional information to assist in authenticating a speaker providing audio input. In other aspects, the system includes a data store of speaker voiceprints and a voiceprint matching module adapted to receive an audio input and operable to attempt to assist in authenticating a speaker by matching the audio input to at least one of the speaker voiceprints. The voiceprint matching module adjusts a confidence of voiceprint match attempt results by at least one of: (a) a number of utterance repetitions upon which a matching speaker voiceprint has been trained; or (b) a passage of time since a training occurrence associated with a matching speaker voiceprint.

    摘要翻译: 扬声器认证系统包括数据定影器,其可操作以将声纹匹配尝试结果与附加信息融合,以帮助认证提供音频输入的扬声器。 在其他方面,该系统包括扬声器声纹的数据存储器和声纹匹配模块,其适于接收音频输入并且可操作以通过将音频输入与扬声器声纹中的至少一个相匹配来尝试辅助认证扬声器。 声纹匹配模块通过以下至少一个来调整声纹匹配尝试结果的置信度:(a)已经训练了匹配的说话者声纹的多个话语重复; 或(b)与匹配的说话者声纹相关联的训练发生之后的时间段。

    Assistive call center interface
    37.
    发明授权
    Assistive call center interface 失效
    辅助呼叫中心接口

    公开(公告)号:US07103553B2

    公开(公告)日:2006-09-05

    申请号:US10454716

    申请日:2003-06-04

    IPC分类号: G10L15/00

    CPC分类号: G10L15/1822

    摘要: Unstructured voice information from an incoming caller is processed by automatic speech recognition and semantic categorization system to convert the information into structured data that may then be used to access one or more databases to retrieve associated supplemental data. The structured data and associated supplemental data are then made available through a presentation system that provides information to the call center agent and, optionally, to the incoming caller. The system thus allows a call center information processing system to handle unstructured voice input for use by the live agent in handling the incoming call and for storage and retrieval at a later time. The semantic analysis system may be implemented by a global parser or by an information retrieval technique, such as latent semantic analysis. Co-occurrence of keywords may be used to associate prior calls with an incoming call to assist in understanding the purpose of the incoming call.

    摘要翻译: 来自呼叫者的非结构化语音信息由自动语音识别和语义分类系统处理,以将信息转换成结构化数据,然后可以用于访问一个或多个数据库以检索相关联的补充数据。 结构化数据和相关的补充数据然后通过向呼叫中心代理提供信息并且可选地提供给传入呼叫者的呈现系统可用。 因此,该系统允许呼叫中心信息处理系统处理非结构化语音输入以供实时代理使用以处理来话呼叫并在以后的时间进行存储和检索。 语义分析系统可以由全局解析器或诸如潜在语义分析之类的信息检索技术来实现。 关键字的共现可以用于将先前的呼叫与呼入呼叫相关联,以帮助理解来话呼叫的目的。

    Multilingual text-to-speech system with limited resources
    38.
    发明申请
    Multilingual text-to-speech system with limited resources 有权
    具有有限资源的多语言文字到语音系统

    公开(公告)号:US20050182630A1

    公开(公告)日:2005-08-18

    申请号:US10771256

    申请日:2004-02-02

    IPC分类号: G10L13/00 G10L13/08

    CPC分类号: G10L13/08

    摘要: A multilingual text-to-speech system includes a source datastore of primary source parameters providing information about a speaker of a primary language. A plurality of primary filter parameters provides information about sounds in the primary language. A plurality of secondary filter parameters provides information about sounds in a secondary language. One or more secondary filter parameters is normalized to the primary filter parameters and mapped to a primary source parameter.

    摘要翻译: 多语言文本到语音系统包括提供关于主要语言的说话者的信息的主要源参数的源数据库。 多个主要滤波器参数提供关于主要语言的声音的信息。 多个次级滤波器参数提供关于次要语言的声音的信息。 将一个或多个二次滤波器参数归一化为主滤波器参数并映射到主要源参数。

    Text selection and recording by feedback and adaptation for development of personalized text-to-speech systems
    39.
    发明授权
    Text selection and recording by feedback and adaptation for development of personalized text-to-speech systems 有权
    通过反馈和适应的文本选择和记录来开发个性化的文本到语音系统

    公开(公告)号:US06792407B2

    公开(公告)日:2004-09-14

    申请号:US09821973

    申请日:2001-03-30

    IPC分类号: G10L1306

    CPC分类号: G10L13/047 G10L13/04

    摘要: A new speaker provides speech from which comparison snippets are extracted. The comparison snippets are compared with initial snippets stored in a recorded snippet database that is associated with a concatenative synthesizer. The comparison of the snippets to the initial snippets produces required sound units. A greedy selection algorithm is performed with the required sound units for identifying the smallest subset of the input text that contains all of the text for the new speaker to read. The new speaker then reads the optimally selected text and sound units are extracted from the human speech such that the recorded snippet database is modified and the speech synthesized adopts the voice quality and characteristics of the new speaker.

    摘要翻译: 一个新的演讲者提供了从中提取比较片段的演讲。 将比较片段与存储在与级联合成器相关联的记录片段数据库中的初始片段进行比较。 代码段与初始代码段的比较会产生所需的声音单位。 使用所需的声音单元执行贪婪选择算法,用于识别包含要被读取的新发言者的所有文本的输入文本的最小子集。 然后,新的扬声器读取最佳选择的文本,并且从人类语音中提取声音单元,使得记录的片段数据库被修改并且合成的语音采用新的说话者的语音质量和特征。

    Technique for developing discriminative sound units for speech recognition and allophone modeling
    40.
    发明授权
    Technique for developing discriminative sound units for speech recognition and allophone modeling 有权
    用于发展用于语音识别和异音素建模的辨别声音单元的技术

    公开(公告)号:US06711541B1

    公开(公告)日:2004-03-23

    申请号:US09390434

    申请日:1999-09-07

    IPC分类号: G10L1504

    CPC分类号: G10L15/063 G10L2015/025

    摘要: A set of models is developed to represent sound units and these models are then used with the incorrect sound units to determine which generate high likelihood scores. The models generating high likelihood scores for the incorrect sound units represent those that are more likely to be confused. The resulting confusability data may then be used in generating more discriminative speech models and in subsequent pruning of the acoustic decision tree. The confusability data may also be used to develop confusability predictors used for rejection during search and in developing continuous speech recognition models that are optimized to minimize confusability.

    摘要翻译: 开发了一组模型来表示声音单元,然后将这些模型与不正确的声音单元一起使用以确定哪个产生高似然分数。 为不正确声音单位产生高似然分数的模型代表更可能被混淆的那些。 所产生的可混淆性数据然后可以用于产生更具歧视性的语音模型以及随后的声学决策树的修剪。 可混淆性数据还可用于开发用于搜索期间拒绝的混淆性预测变量,并开发出经过优化以最小化混淆性的连续语音识别模型。