Reduction of search space in speech recognition using phone boundaries
and phone ranking
    63.
    发明授权
    Reduction of search space in speech recognition using phone boundaries and phone ranking 失效
    使用手机边界和手机排名减少语音识别中的搜索空间

    公开(公告)号:US5729656A

    公开(公告)日:1998-03-17

    申请号:US347013

    申请日:1994-11-30

    摘要: A method for estimating the probability of phone boundaries and the accuracy of the acoustic modelling in reducing a search-space in a speech recognition system. The accuracy of the acoustic modelling is quantified by the rank of the correct phone. The system includes a microphone for converting an utterance into an electrical signal, which is processed by an acoustic processor and label match which finds the best-matched acoustic label prototype. A probability distribution on phone boundaries is produced for every time frame using a first decision tree. These probabilities are compared to a threshold and some time frames are identified as boundaries between phones. An acoustic score is computed for all phones between every given pair of hypothesized boundaries, and the phones are ranked on the basis of this score. A second decision tree is traversed for every time frame to obtain the worst case rank of the correct phone at that time, and a short list of allowed phones is made for every time frame. A fast acoustic word match processor matches the label string from the acoustic processor to produce an utterance signal which includes at least one word. From recognition candidates produced by the fast acoustic match and the language model, the detailed acoustic match matches the label string from the acoustic processor against acoustic word models and outputs a word string corresponding to an utterance.

    摘要翻译: 一种用于在减少语音识别系统中的搜索空间中估计电话边界的概率和声学建模的准确度的方法。 声学建模的准确度由正确的手机的等级来量化。 该系统包括用于将发音转换成电信号的麦克风,该电信号由声学处理器处理,并且标签匹配找到最佳匹配的声学标签原型。 使用第一决策树为每个时间帧产生电话边界上的概率分布。 将这些概率与阈值进行比较,并且将一些时间帧识别为电话之间的边界。 对于所有给定的一对假设边界之间的所有电话,计算声学得分,并且手机基于该分数进行排名。 每个时间帧都会遍历第二个决策树,以获得当时正确的电话的最差情况等级,并为每个时间帧制作一个简短的允许电话列表。 快速声学词匹配处理器将来自声学处理器的标签串匹配以产生包括至少一个单词的话语信号。 从快速声学匹配和语言模型产生的识别候选中,详细的声匹配将来自声学处理器的标签串与声学词模型相匹配,并输出与发音对应的字串。

    Continuous parameter hidden Markov model approach to automatic
handwriting recognition
    64.
    发明授权
    Continuous parameter hidden Markov model approach to automatic handwriting recognition 失效
    连续参数隐马尔可夫模型法自动手写识别

    公开(公告)号:US5544257A

    公开(公告)日:1996-08-06

    申请号:US818193

    申请日:1992-01-08

    CPC分类号: G06K9/6297

    摘要: A computer-based system and method for recognizing handwriting. The present invention includes a preprocessor, a front end, and a modeling component. The present invention operates as follows. First, the present invention identifies the lexemes for all characters of interest. Second, the present invention performs a training phase in order to generate a hidden Markov model for each of the lexemes. Third, the present invention performs a decoding phase to recognize handwritten text. Hidden Markov models for lexemes are produced during the training phase. The present invention performs the decoding phase as follows. The present invention receives test characters to be decoded (that is, to be recognized). The present invention generates sequences of feature vectors for the test characters by mapping in chirographic space. For each of the test characters, the present invention computes probabilities that the test character can be generated by the hidden Markov models. The present invention decodes the test character as the recognized character associated with the hidden Markov model having the greatest probability.

    摘要翻译: 一种用于识别笔迹的基于计算机的系统和方法。 本发明包括预处理器,前端和建模部件。 本发明如下操作。 首先,本发明识别所有感兴趣的人物的词汇。 第二,本发明执行训练阶段,以便为每个词汇生成隐马尔可夫模型。 第三,本发明执行解码阶段来识别手写文本。 训练阶段产生了隐马尔可夫模型。 本发明如下进行解码阶段。 本发明接收要解码的测试字符(即将被识别)。 本发明通过在手写空间中映射来生成用于测试字符的特征向量的序列。 对于每个测试字符,本发明计算由隐马尔可夫模型可以产生测试字符的概率。 本发明将测试字符解码为与具有最大概率的隐马尔可夫模型相关联的识别字符。

    Speech coding apparatus having speaker dependent prototypes generated
from nonuser reference data
    65.
    发明授权
    Speech coding apparatus having speaker dependent prototypes generated from nonuser reference data 失效
    具有由非用户参考数据生成的具有说话者依赖原型的语音编码装置

    公开(公告)号:US5278942A

    公开(公告)日:1994-01-11

    申请号:US802678

    申请日:1991-12-05

    CPC分类号: G10L15/063 G10L15/02

    摘要: A speech coding apparatus and method for use in a speech recognition apparatus and method. The value of at least one feature of an utterance is measured during each of a series of successive time intervals to produce a series of feature vector signals representing the feature values. A plurality of prototype vector signals, each having at least one parameter value and a unique identification value are stored. The closeness of the feature vector signal is compared to the parameter values of the prototype vector signals to obtain prototype match scores for the feature value signal and each prototype vector signal. The identification value of the prototype vector signal having the best prototype match score is output as a coded representation signal of the feature vector signal. Speaker-dependent prototype vector signals are generated from both synthesized training vector signals and measured training vector signals. The synthesized training vector signals are transformed reference feature vector signals representing the values of features of one or more utterances of one or more speakers in a reference set of speakers. The measured training feature vector signals represent the values of features of one or more utterances of a new speaker/user not in the reference set.

    摘要翻译: 一种用于语音识别装置和方法的语音编码装置和方法。 在一系列连续时间间隔的每一个期间测量话音的至少一个特征的值,以产生表示特征值的一系列特征向量信号。 存储多个具有至少一个参数值和唯一识别值的原型矢量信号。 将特征矢量信号的接近度与原型矢量信号的参数值进行比较,以获得特征值信号和每个原型矢量信号的原型匹配分数。 输出具有最佳原型匹配分数的原型矢量信号的识别值作为特征矢量信号的编码表示信号。 从合成的训练矢量信号和测量的训练矢量信号产生与扬声器相关的原型矢量信号。 合成的训练矢量信号是变换的参考特征矢量信号,其代表参考的一组扬声器中的一个或多个扬声器的一个或多个话音的特征值。 测量的训练特征向量信号表示不在参考集合中的新的说话者/用户的一个或多个话语的特征值。

    Normalization of speech by adaptive labelling
    66.
    发明授权
    Normalization of speech by adaptive labelling 失效
    通过自适应标签规范语音

    公开(公告)号:US4926488A

    公开(公告)日:1990-05-15

    申请号:US71687

    申请日:1987-07-09

    CPC分类号: G10L15/07 G10L15/20

    摘要: In a speech processor system in which prototype vectors of speech are generated by an acoustic processor under reference noise and known ambient conditions and in which feature vectors of speech are generated during varying noise and other ambient and recording conditions, normalized vectors are generated to reflect the form the feature vectors would have if generated under the reference conditions. The normalized vectors are generated by: (a) applying an operator function A.sub.i to a set of feature vectors x occurring at or before time interval i to yield a normalized vector y.sub.i =A.sub.i (x); (b) determining a distance error vector E.sub.i by which the normalized vector is projectively moved toward the closest prototype vector to the normalized vector y.sub.i ; (c) up-dating the operator function for next time interval to correspond to the most recently determined distance error vector; and (d) incrementing i to the next time interval and repeating steps (a) through (d) wherein the feature vector corresponding to the incremented i value has the most recent up-dated operator function applied thereto. With successive time intervals, successive normalized vectors are generated based on a successively up-dated operator function. For each normalized vector, the closest prototype thereto is associated therewith. The string of normalized vectors or the string of associated prototypes (or respective label identifiers thereof) or both provide output from the acoustic processor.

    Voice transformation with encoded information
    67.
    发明授权
    Voice transformation with encoded information 有权
    具有编码信息的语音变换

    公开(公告)号:US08930182B2

    公开(公告)日:2015-01-06

    申请号:US13049924

    申请日:2011-03-17

    CPC分类号: G10L21/003 G10L19/018

    摘要: Method, system, and computer program product for voice transformation are provided. The method includes transforming a source speech using transformation parameters, and encoding information on the transformation parameters in an output speech using steganography, wherein the source speech can be reconstructed using the output speech and the information on the transformation parameters. A method for reconstructing voice transformation is also provided including: receiving an output speech of a voice transformation system wherein the output speech is transformed speech which has encoded information on the transformation parameters using steganography; extracting the information on the transformation parameters; and carrying out an inverse transformation of the output speech to obtain an approximation of an original source speech.

    摘要翻译: 提供语音转换的方法,系统和计算机程序产品。 该方法包括使用变换参数来变换源语言,以及使用隐写术对输入语音中的变换参数对信息进行编码,其中可以使用输出语音和关于变换参数的信息来重构源语音。 还提供了一种用于重建语音变换的方法,包括:接收语音转换系统的输出语音,其中输出语音是使用隐写术编码关于变换参数的信息的变换语音; 提取变换参数信息; 并执行输出语音的逆变换以获得原始源语音的近似。

    Automatically updating meeting information
    68.
    发明授权
    Automatically updating meeting information 有权
    自动更新会议信息

    公开(公告)号:US08867707B2

    公开(公告)日:2014-10-21

    申请号:US13069591

    申请日:2011-03-23

    摘要: Techniques for automatically providing updated meeting information are provided. The techniques include facilitating receipt of a message pertaining to a meeting, automatically interpreting the message to determine if the message requires that meeting information be changed, automatically updating the meeting information if a change is required from the message, and automatically sending a message to each meeting participant informing each participant of the updated meeting information.

    摘要翻译: 提供了自动提供更新的会议信息的技术。 这些技术包括促进收到与会议有关的消息,自动解释消息以确定消息是否要求更改会议信息,如果需要从消息中改变会自动更新会议信息,并自动向每个消息发送消息 会议参与者通知每个参与者更新的会议信息。

    Directional optimization via EBW
    69.
    发明授权
    Directional optimization via EBW 有权
    通过EBW定向优化

    公开(公告)号:US08527566B2

    公开(公告)日:2013-09-03

    申请号:US12777768

    申请日:2010-05-11

    IPC分类号: G06F7/00

    CPC分类号: G06F17/11

    摘要: An optimization system and method includes determining a best gradient as a sparse direction in a function having a plurality of parameters. The sparse direction includes a direction that maximizes change of the function. This maximum change of the function is determined by performing an optimization process that gives maximum growth subject to a sparsity regularized constraint. An extended Baum Welch (EBW) method can be used to identify the sparse direction. A best step size is determined along the sparse direction by finding magnitudes of entries of direction that maximizes the function restricted to the sparse direction. A solution is recursively refined for the function optimization using a processor and storage media.

    摘要翻译: 优化系统和方法包括在具有多个参数的函数中确定最佳梯度作为稀疏方向。 稀疏方向包括使功能变化最大化的方向。 通过执行优化处理来确定功能的最大变化,该优化过程允许受到稀疏正则化约束的最大增长。 扩展的Baum Welch(EBW)方法可用于识别稀疏方向。 通过找到使限于稀疏方向的功能最大化的方向条目的大小,沿着稀疏方向确定最佳步长。 使用处理器和存储介质递归地优化了功能优化的解决方案。

    Processing user input in accordance with input types accepted by an application
    70.
    发明授权
    Processing user input in accordance with input types accepted by an application 有权
    根据应用程序接受的输入类型处理用户输入

    公开(公告)号:US08370163B2

    公开(公告)日:2013-02-05

    申请号:US13242874

    申请日:2011-09-23

    IPC分类号: G10L21/00 G10L15/00 G10L15/04

    CPC分类号: G10L15/24 G06F3/167 G10L15/22

    摘要: In a voice processing system, a multimodal request is received from a plurality of modality input devices, and the requested application is run to provide a user with the feedback of the multimodal request. In the voice processing system, a multimodal aggregating unit is provided which receives a multimodal input from a plurality of modality input devices, and provides an aggregated result to an application control based on the interpretation of the interaction ergonomics of the multimodal input within the temporal constraints of the multimodal input. Thus, the multimodal input from the user is recognized within a temporal window. Interpretation of the interaction ergonomics of the multimodal input include interpretation of interaction biometrics and interaction mechani-metrics, wherein the interaction input of at least one modality may be used to bring meaning to at least one other input of another modality.

    摘要翻译: 在语音处理系统中,从多个模态输入设备接收多模态请求,并且运行所请求的应用以向用户提供多模态请求的反馈。 在语音处理系统中,提供了多模聚合单元,其接收来自多个模态输入设备的多模式输入,并且基于在时间约束内的多模式输入的交互人体工程学的解释来将聚合结果提供给应用控制 的多模态输入。 因此,在时间窗口内识别来自用户的多模式输入。 对多模式输入的相互作用人体工程学的解释包括交互生物特征和交互机制度量的解释,其中至少一种模态的交互输入可以用于给另一种模态的至少一个其他输入带来意义。