Rapidly training a speech recognizer to a subsequent speaker given
training data of a reference speaker
    11.
    发明授权
    Rapidly training a speech recognizer to a subsequent speaker given training data of a reference speaker 失效
    给予演讲者训练数据的后续发言者快速训练语音识别器

    公开(公告)号:US4817156A

    公开(公告)日:1989-03-28

    申请号:US84712

    申请日:1987-08-10

    CPC classification number: G10L15/14

    Abstract: Apparatus and method for training the statistics of a Markov Model speech recognizer to a subsequent speaker who utters part of a training text after the recognizer has been trained for the statistics of a reference speaker who utters a full training text. Where labels generated by an acoustic processor in response to uttered speech serve as outputs for Markov models, the present apparatus and method determine label output probabilities at transitions in the Markov models corresponding to the subsequent speaker where there is sparse training data. Specifically, label output probabilities for the subsequent speaker are re-parameterized based on confusion matrix entries having values indicative of the similarity between an lth label output of the subsequent speaker and a kth label output for the reference speaker. The label output probabilities based on re-parameterized data are combined with initialized label output probabilities to form "smoothed" label output probabilities which feature smoothed probability distributions. Based on label outputs generated when the subsequent speaker utters the shortened training text, "basic" label output probabilities computed by conventional methodology are linearly averaged against the smoothed label output probabilities to produce improved label output probabilities.

    Voice transformation with encoded information
    12.
    发明授权
    Voice transformation with encoded information 有权
    具有编码信息的语音变换

    公开(公告)号:US08930182B2

    公开(公告)日:2015-01-06

    申请号:US13049924

    申请日:2011-03-17

    CPC classification number: G10L21/003 G10L19/018

    Abstract: Method, system, and computer program product for voice transformation are provided. The method includes transforming a source speech using transformation parameters, and encoding information on the transformation parameters in an output speech using steganography, wherein the source speech can be reconstructed using the output speech and the information on the transformation parameters. A method for reconstructing voice transformation is also provided including: receiving an output speech of a voice transformation system wherein the output speech is transformed speech which has encoded information on the transformation parameters using steganography; extracting the information on the transformation parameters; and carrying out an inverse transformation of the output speech to obtain an approximation of an original source speech.

    Abstract translation: 提供语音转换的方法,系统和计算机程序产品。 该方法包括使用变换参数来变换源语言,以及使用隐写术对输入语音中的变换参数对信息进行编码,其中可以使用输出语音和关于变换参数的信息来重构源语音。 还提供了一种用于重建语音变换的方法,包括:接收语音转换系统的输出语音,其中输出语音是使用隐写术编码关于变换参数的信息的变换语音; 提取变换参数信息; 并执行输出语音的逆变换以获得原始源语音的近似。

    Automatically updating meeting information
    13.
    发明授权
    Automatically updating meeting information 有权
    自动更新会议信息

    公开(公告)号:US08867707B2

    公开(公告)日:2014-10-21

    申请号:US13069591

    申请日:2011-03-23

    CPC classification number: G06Q10/109 H04L12/1895 H04L51/02

    Abstract: Techniques for automatically providing updated meeting information are provided. The techniques include facilitating receipt of a message pertaining to a meeting, automatically interpreting the message to determine if the message requires that meeting information be changed, automatically updating the meeting information if a change is required from the message, and automatically sending a message to each meeting participant informing each participant of the updated meeting information.

    Abstract translation: 提供了自动提供更新的会议信息的技术。 这些技术包括促进收到与会议有关的消息,自动解释消息以确定消息是否要求更改会议信息,如果需要从消息中改变会自动更新会议信息,并自动向每个消息发送消息 会议参与者通知每个参与者更新的会议信息。

    Directional optimization via EBW
    14.
    发明授权
    Directional optimization via EBW 有权
    通过EBW定向优化

    公开(公告)号:US08527566B2

    公开(公告)日:2013-09-03

    申请号:US12777768

    申请日:2010-05-11

    CPC classification number: G06F17/11

    Abstract: An optimization system and method includes determining a best gradient as a sparse direction in a function having a plurality of parameters. The sparse direction includes a direction that maximizes change of the function. This maximum change of the function is determined by performing an optimization process that gives maximum growth subject to a sparsity regularized constraint. An extended Baum Welch (EBW) method can be used to identify the sparse direction. A best step size is determined along the sparse direction by finding magnitudes of entries of direction that maximizes the function restricted to the sparse direction. A solution is recursively refined for the function optimization using a processor and storage media.

    Abstract translation: 优化系统和方法包括在具有多个参数的函数中确定最佳梯度作为稀疏方向。 稀疏方向包括使功能变化最大化的方向。 通过执行优化处理来确定功能的最大变化,该优化过程允许受到稀疏正则化约束的最大增长。 扩展的Baum Welch(EBW)方法可用于识别稀疏方向。 通过找到使限于稀疏方向的功能最大化的方向条目的大小,沿着稀疏方向确定最佳步长。 使用处理器和存储介质递归地优化了功能优化的解决方案。

    Processing user input in accordance with input types accepted by an application
    15.
    发明授权
    Processing user input in accordance with input types accepted by an application 有权
    根据应用程序接受的输入类型处理用户输入

    公开(公告)号:US08370163B2

    公开(公告)日:2013-02-05

    申请号:US13242874

    申请日:2011-09-23

    CPC classification number: G10L15/24 G06F3/167 G10L15/22

    Abstract: In a voice processing system, a multimodal request is received from a plurality of modality input devices, and the requested application is run to provide a user with the feedback of the multimodal request. In the voice processing system, a multimodal aggregating unit is provided which receives a multimodal input from a plurality of modality input devices, and provides an aggregated result to an application control based on the interpretation of the interaction ergonomics of the multimodal input within the temporal constraints of the multimodal input. Thus, the multimodal input from the user is recognized within a temporal window. Interpretation of the interaction ergonomics of the multimodal input include interpretation of interaction biometrics and interaction mechani-metrics, wherein the interaction input of at least one modality may be used to bring meaning to at least one other input of another modality.

    Abstract translation: 在语音处理系统中,从多个模态输入设备接收多模态请求,并且运行所请求的应用以向用户提供多模态请求的反馈。 在语音处理系统中,提供了多模聚合单元,其接收来自多个模态输入设备的多模式输入,并且基于在时间约束内的多模式输入的交互人体工程学的解释来将聚合结果提供给应用控制 的多模态输入。 因此,在时间窗口内识别来自用户的多模式输入。 对多模式输入的相互作用人体工程学的解释包括交互生物特征和交互机制度量的解释,其中至少一种模态的交互输入可以用于给另一种模态的至少一个其他输入带来意义。

    SPARSE REPRESENTATION FEATURES FOR SPEECH RECOGNITION
    16.
    发明申请
    SPARSE REPRESENTATION FEATURES FOR SPEECH RECOGNITION 有权
    用于语音识别的小数代表特征

    公开(公告)号:US20120078621A1

    公开(公告)日:2012-03-29

    申请号:US12889845

    申请日:2010-09-24

    CPC classification number: G10L15/02

    Abstract: Techniques are disclosed for generating and using sparse representation features to improve speech recognition performance. In particular, principles of the invention provide sparse representation exemplar-based recognition techniques. For example, a method comprises the following steps. A test vector and a training data set associated with a speech recognition system are obtained. A subset of the training data set is selected. The test vector is mapped with the selected subset of the training data set as a linear combination that is weighted by a sparseness constraint such that a new test feature set is formed wherein the training data set is moved more closely to the test vector subject to the sparseness constraint. An acoustic model is trained on the new test feature set.The acoustic model trained on the new test feature set may be used to decode user speech input to the speech recognition system.

    Abstract translation: 公开了用于生成和使用稀疏表示特征以改善语音识别性能的技术。 特别地,本发明的原理提供了基于示例的稀疏表示识别技术。 例如,一种方法包括以下步骤。 获得与语音识别系统相关联的测试向量和训练数据集。 选择训练数据集的子集。 将测试向量与所选择的训练数据集的子集映射为由稀疏约束加权的线性组合,使得形成新的测试特征集合,其中训练数据集更接近地移动到受测对象的测试向量 稀疏约束 在新的测试功能集上训练声学模型。 在新测试特征集上训练的声学模型可以用于解码输入到语音识别系统的用户语音。

    DISTANCE METRICS FOR UNIVERSAL PATTERN PROCESSING TASKS
    20.
    发明申请
    DISTANCE METRICS FOR UNIVERSAL PATTERN PROCESSING TASKS 有权
    用于通用图案处理任务的距离度量

    公开(公告)号:US20090259471A1

    公开(公告)日:2009-10-15

    申请号:US12101791

    申请日:2008-04-11

    CPC classification number: G10L17/04 G10L15/063

    Abstract: A universal pattern processing system receives input data and produces output patterns that are best associated with said data. The system uses input means receiving and processing input data, a universal pattern decoder means transforming models using the input data and associating output patterns with original models that are changed least during transforming, and output means outputting best associated patterns chosen by a pattern decoder means.

    Abstract translation: 通用模式处理系统接收输入数据并产生最好与所述数据相关联的输出模式。 该系统使用输入装置接收和处理输入数据,通用模式解码器装置使用输入数据变换模型,并将输出模式与在变换期间最小变化的原始模型相关联,以及输出装置输出由模式解码器装置选择的最佳相关模式。

Patent Agency Ranking