Voice tagging, voice annotation, and speech recognition for portable devices with optional post processing
    1.
    发明申请
    Voice tagging, voice annotation, and speech recognition for portable devices with optional post processing 有权
    语音标记,语音注释和可选后置处理的便携式设备的语音识别

    公开(公告)号:US20050075881A1

    公开(公告)日:2005-04-07

    申请号:US10677174

    申请日:2003-10-02

    IPC分类号: G10L15/26 G10L21/00

    CPC分类号: G06F17/30796 G10L15/26

    摘要: A media capture device has an audio input receptive of user speech relating to a media capture activity in close temporal relation to the media capture activity. A plurality of focused speech recognition lexica respectively relating to media capture activities are stored on the device, and a speech recognizer recognizes the user speech based on a selected one of the focused speech recognition lexica. A media tagger tags captured media with generated speech recognition text, and a media annotator annotates the captured media with a sample of the user speech that is suitable for input to a speech recognizer. Tagging and annotating are based on close temporal relation between receipt of the user speech and capture of the captured media. Annotations may be converted to tags during post processing, employed to edit a lexicon using letter-to-sound rules and spelled word input, or matched directly to speech to retrieve captured media.

    摘要翻译: 媒体捕获设备具有接收与媒体捕获活动紧密相关的媒体捕获活动的用户语音的音频输入。 分别与媒体捕获活动相关的多个聚焦语音识别词典被存储在设备上,并且语音识别器基于所选择的一个焦点语音识别词典识别用户语音。 媒体标签器使用生成的语音识别文本来标记捕获的媒体,并且媒体注释器用适合于输入到语音识别器的用户语音的样本来注释所捕获的媒体。 标记和注释是基于用户语音的接收和捕获的媒体的捕获之间的紧密的时间关系。 在后期处理中,注释可以转换为标签,用于使用字母对声音规则和拼写单词输入来编辑词典,或直接与语音匹配以检索所捕获的媒体。

    Speaker verification and speaker identification based on a priori knowledge
    2.
    发明授权
    Speaker verification and speaker identification based on a priori knowledge 有权
    基于先验知识的扬声器验证和扬声器识别

    公开(公告)号:US06697778B1

    公开(公告)日:2004-02-24

    申请号:US09610495

    申请日:2000-07-05

    IPC分类号: G10L1506

    CPC分类号: G10L17/02

    摘要: Client speaker locations in a speaker space are used to generate speech models for comparison with test speaker data or test speaker speech models. The speaker space can be constructed using training speakers that are entirely separate from the population of client speakers, or from client speakers, or from a mix of training and client speakers. Reestimation of the speaker space based on client environment information is also provided to improve the likelihood that the client data will fall within the speaker space. During enrollment of the clients into the speaker space, additional client speech can be obtained when predetermined conditions are met. The speaker distribution can also be used in the client enrollment step.

    摘要翻译: 扬声器空间中的客户扬声器位置用于产生用于与测试扬声器数据或测试扬声器语音模型进行比较的语音模型。 扬声器空间可以使用与客户端扬声器或客户端扬声器完全分开的训练扬声器,或者由训练和客户端扬声器组合构成。 还提供了基于客户端环境信息对扬声器空间的再估计,以提高客户端数据落入扬声器空间的可能性。 在将客户登记到扬声器空间中,当满足预定条件时,可以获得额外的客户端语音。 扬声器分配也可以在客户端注册步骤中使用。

    Speaker verification and speaker identification based on eigenvoices
    3.
    发明授权
    Speaker verification and speaker identification based on eigenvoices 失效
    基于特征语音的扬声器验证和扬声器识别

    公开(公告)号:US6141644A

    公开(公告)日:2000-10-31

    申请号:US148911

    申请日:1998-09-04

    CPC分类号: G10L17/02

    摘要: Speech models are constructed and trained upon the speech of known client speakers (and also impostor speakers, in the case of speaker verification). Parameters from these models are concatenated to define supervectors and a linear transformation upon these supervectors results in a dimensionality reduction yielding a low-dimensional space called eigenspace. The training speakers are then represented as points or distributions in eigenspace. Thereafter, new speech data from the test speaker is placed into eigenspace through a similar linear transformation and the proximity in eigenspace of the test speaker to the training speakers serves to authenticate or identify the test speaker.

    摘要翻译: 语音模型根据已知的客户端扬声器的语音进行构建和训练(并且在演讲人验证的情况下也引用了演讲者)。 来自这些模型的参数被连接以定义超级向量,并且这些超向量的线性变换导致维度降低,产生称为本征空间的低维空间。 培训演讲者随后被表示为本土空间的分数或分布。 此后,来自测试扬声器的新的语音数据通过类似的线性变换被放置到本征空间中,并且测试扬声器的本征空间与训练扬声器的接近度用于认证或识别测试扬声器。

    Media production system using time alignment to scripts
    4.
    发明申请
    Media production system using time alignment to scripts 审中-公开
    媒体制作系统使用时间对齐脚本

    公开(公告)号:US20050228663A1

    公开(公告)日:2005-10-13

    申请号:US10814960

    申请日:2004-03-31

    IPC分类号: G10L15/26

    CPC分类号: G10L15/26

    摘要: A media production system includes a textual alignment module aligning multiple speech recordings to textual lines of a script based on speech recognition results. A navigation module responds to user navigation selections respective of the textual lines of the script by communicating to the user corresponding, line-specific portions of the multiple speech recordings. An editing module responds to user associations of multiple speech recordings with textual lines by accumulating line-specific portions of the multiple speech recordings in a combination recording based on at least one of relationships of textual lines in the script to the combination recording, and temporal alignments between the multiple speech recordings and the combination recording.

    摘要翻译: 媒体制作系统包括文本对准模块,其基于语音识别结果将多个语音记录与脚本的文本行对齐。 导航模块通过与用户对应的多个语音记录的线特定部分通信来响应相应于脚本的文本行的用户导航选择。 编辑模块通过基于脚本中的文本行的关系与组合记录中的至少一种相结合记录来组合记录中的多个语音记录的行特定部分来累积多个语音记录与文本行的响应,以及时间对齐 在多个语音记录和组合记录之间。

    Voice tagging, voice annotation, and speech recognition for portable devices with optional post processing
    5.
    发明授权
    Voice tagging, voice annotation, and speech recognition for portable devices with optional post processing 有权
    语音标记,语音注释和可选后置处理的便携式设备的语音识别

    公开(公告)号:US07324943B2

    公开(公告)日:2008-01-29

    申请号:US10677174

    申请日:2003-10-02

    IPC分类号: G10L21/00 H04N5/76

    CPC分类号: G06F17/30796 G10L15/26

    摘要: A media capture device has an audio input receptive of user speech relating to a media capture activity in close temporal relation to the media capture activity. A plurality of focused speech recognition lexica respectively relating to media capture activities are stored on the device, and a speech recognizer recognizes the user speech based on a selected one of the focused speech recognition lexica. A media tagger tags captured media with generated speech recognition text, and a media annotator annotates the captured media with a sample of the user speech that is suitable for input to a speech recognizer. Tagging and annotating are based on close temporal relation between receipt of the user speech and capture of the captured media. Annotations may be converted to tags during post processing, employed to edit a lexicon using letter-to-sound rules and spelled word input, or matched directly to speech to retrieve captured media.

    摘要翻译: 媒体捕获设备具有接收与媒体捕获活动紧密相关的媒体捕获活动的用户语音的音频输入。 分别与媒体捕获活动相关的多个聚焦语音识别词典被存储在设备上,并且语音识别器基于所选择的一个焦点语音识别词典识别用户语音。 媒体标签器使用生成的语音识别文本来标记捕获的媒体,并且媒体注释器用适合于输入到语音识别器的用户语音的样本来注释所捕获的媒体。 标记和注释是基于用户语音的接收和捕获的媒体的捕获之间的紧密的时间关系。 在后期处理中,注释可以转换为标签,用于使用字母对声音规则和拼写单词输入来编辑词典,或直接与语音匹配以检索所捕获的媒体。

    Speech data mining for call center management
    6.
    发明申请
    Speech data mining for call center management 审中-公开
    语音数据挖掘用于呼叫中心管理

    公开(公告)号:US20050010411A1

    公开(公告)日:2005-01-13

    申请号:US10616006

    申请日:2003-07-09

    IPC分类号: G10L15/26 G10L17/00 G10L15/00

    CPC分类号: G10L15/26 G10L17/00

    摘要: A speech data mining system for use in generating a rich transcription having utility in call center management includes a speech differentiation module differentiating between speech of interacting speakers, and a speech recognition module improving automatic recognition of speech of one speaker based on interaction with another speaker employed as a reference speaker. A transcript generation module generates a rich transcript based on recognized speech of the speakers. Focused, interactive language models improve recognition of a customer on a low quality channel using context extracted from speech of a call center operator on a high quality channel with a speech model adapted to the operator. Mined speech data includes number of interaction turns, customer frustration phrases, operator polity, interruptions, and/or contexts extracted from speech recognition results, such as topics, complaints, solutions, and resolutions. Mined speech data is useful in call center and/or product or service quality management.

    摘要翻译: 用于产生在呼叫中心管理中具有效用的丰富录音的语音数据挖掘系统包括区分交互式扬声器的语音的语音区分模块和改善一个扬声器的语音的自动识别的语音识别模块, 作为参考发言人。 转录本生成模块基于扬声器的识别语音生成丰富的录音。 专注的交互式语言模型通过使用适合于操作员的语音模型,在高质量频道上从呼叫中心运营商的语音提取的上下文,改善对低质量信道上客户的识别。 挖掘的语音数据包括从诸如主题,投诉,解决方案和分辨率的语音识别结果中提取的交互轮廓数量,客户沮丧短语,运营商政治,中断和/或上下文。 挖掘的语音数据在呼叫中心和/或产品或服务质量管理中是有用的。

    Method for additive and convolutional noise adaptation in automatic speech recognition using transformed matrices
    7.
    发明授权
    Method for additive and convolutional noise adaptation in automatic speech recognition using transformed matrices 有权
    使用变换矩阵的自动语音识别中的加法和卷积噪声适应的方法

    公开(公告)号:US06691091B1

    公开(公告)日:2004-02-10

    申请号:US09628376

    申请日:2000-07-31

    IPC分类号: G10L1506

    摘要: A noise adaptation system and method provide for noise adaptation in a speech recognition system. The method includes the steps of generating a reference model based on a training speech signal, and compensating the reference model for additive noise in the cepstral domain. The reference model is also compensated for convolutional noise in the cepstral domain. In one embodiment, the convolutional noise is compensated for by estimating a convolutional bias between the reference model and a target speech signal. The estimated convolutional bias is transformed with a channel adaptation matrix, and the transformed convolutional bias is added to the reference model in the cepstral domain.

    摘要翻译: 噪声适应系统和方法提供语音识别系统中的噪声适应。 该方法包括以下步骤:基于训练语音信号产生参考模型,以及补偿倒谱域中加性噪声的参考模型。 参考模型也被补偿了倒谱域中的卷积噪声。 在一个实施例中,通过估计参考模型和目标语音信号之间的卷积偏差来补偿卷积噪声。 用通道自适应矩阵对估计的卷积偏差进行变换,并将变换的卷积偏差加到倒谱域中的参考模型中。

    Method for noise adaptation in automatic speech recognition using transformed matrices
    8.
    发明授权
    Method for noise adaptation in automatic speech recognition using transformed matrices 有权
    使用变换矩阵的自动语音识别中的噪声适应方法

    公开(公告)号:US06529872B1

    公开(公告)日:2003-03-04

    申请号:US09551001

    申请日:2000-04-18

    IPC分类号: G10L1506

    摘要: The improved noise adaptation technique employs a linear or non-linear transformation to the set of Jacobian matrices corresponding to an initial noise condition. An &agr;-adaptation parameter or artificial intelligence operation is employed in a linear or non-linear way to increase the adaptation bias added to the speech models. This corrects shortcomings of conventional Jacobian adaptation, which tend to underestimate the effect of noise. The improved adaptation technique is further enhanced by a reduced dimensionality, principal component analysis technique that reduces the computational burden, making the adaptation technique beneficial in embedded recognition systems.

    摘要翻译: 改进的噪声适应技术对与初始噪声条件相对应的雅可比矩阵集合采用线性或非线性变换。 以线性或非线性方式采用阿尔法适应参数或人工智能操作,以增加添加到语音模型中的适应偏差。 这纠正了常规雅各布适应的缺点,这倾向于低估噪声的影响。 改进的适应技术通过降低维度的主要成分分析技术进一步增强,主要成分分析技术降低了计算负担,使得适应技术在嵌入式识别系统中有益。

    Personalized agent for portable devices and cellular phone
    9.
    发明授权
    Personalized agent for portable devices and cellular phone 有权
    便携式设备和手机的个性化代理

    公开(公告)号:US06895257B2

    公开(公告)日:2005-05-17

    申请号:US10077904

    申请日:2002-02-18

    摘要: Personalized agent services are provided in a personal messaging device, such as a cellular telephone or personal digital assistant, through services of a speech recognizer that converts speech into text and a text-to-speech synthesizer that converts text to speech. Both recognizer and synthesizer may be server-based or locally deployed within the device. The user dictates an e-mail message which is converted to text and stored. The stored text is sent back to the user as text or as synthesized speech, to allow the user to edit the message and correct transcription errors before sending as e-mail. The system includes a summarization module that prepares short summaries of incoming e-mail and voice mail. The user may access these summaries, and retrieve and organize email and voice mail using speech commands.

    摘要翻译: 通过将语音转换为文本的语音识别器的服务和将文本转换为语音的文本到语音合成器,个性化代理服务被提供在诸如蜂窝电话或个人数字助理的个人消息设备中。 识别器和合成器可以是基于服务器的或本地部署在设备内。 用户指定一个电子邮件消息,转换为文本并存储。 存储的文本作为文本或合成语音发送回用户,以允许用户在作为电子邮件发送之前编辑消息并纠正转录错误。 该系统包括一个汇总模块,准备收到的电子邮件和语音邮件的简要摘要。 用户可以访问这些摘要,并使用语音命令检索和组织电子邮件和语音邮件。

    Automatic search of audio channels by matching viewer-spoken words against closed-caption/audio content for interactive television
    10.
    发明授权
    Automatic search of audio channels by matching viewer-spoken words against closed-caption/audio content for interactive television 有权
    通过将观众口语与针对交互式电视的封闭字幕/音频内容相匹配来自动搜索音频频道

    公开(公告)号:US06480819B1

    公开(公告)日:2002-11-12

    申请号:US09258115

    申请日:1999-02-25

    IPC分类号: G06F1727

    CPC分类号: G10L15/26 G10L15/1815

    摘要: A method and apparatus is provided to enable a user watching and/or listening to a program to search for new information in the stream of a telecommunications data. The apparatus includes a voice recognition system that recognizes the user's request and causes a search to be performed in the long stream of data of at least one other telecommunication channel. The system includes a storage device for storing and processing the request. Upon recognition of the request, the incoming signal or signals are scanned for matches with the request. Upon finding the match between the request and the incoming signal, information related to the data is brought to the viewer's attention. This can be accomplished by either changing the viewer's station or by bringing in a split screen display forward into the display.

    摘要翻译: 提供了一种方法和装置,用于使用户能够观看和/或收听节目以搜索电信数据流中的新信息。 该装置包括语音识别系统,其识别用户的请求并且使得在至少另一个电信信道的长流数据中执行搜索。 该系统包括用于存储和处理该请求的存储装置。 一旦识别到请求,就会扫描输入信号或与该请求匹配的信号。 在找到请求和输入信号之间的匹配时,与数据相关的信息被引起观众的注意。 这可以通过改变观众的电台或将分屏显示向前推入显示器来实现。