Telephone messaging and editing system
    11.
    发明授权
    Telephone messaging and editing system 有权
    电话信息和编辑系统

    公开(公告)号:US06219638B1

    公开(公告)日:2001-04-17

    申请号:US09185332

    申请日:1998-11-03

    IPC分类号: G10L1508

    摘要: A messaging system for receiving speech over a telephone and converting the speech to text includes a first server for receiving speech input by a user, a speech recognition system for converting the speech to text, a speech synthesizer for converting the text to speech for playing back the synthesized speech for correction by the user and a correction mechanism for enabling the user to correct the speech such that the corrected speech is provided as text for transmittal over a communication system.

    摘要翻译: 一种用于通过电话接收语音并将语音转换为文本的消息系统包括用于接收用户输入的语音的第一服务器,用于将语音转换为文本的语音识别系统,用于将文本转换为语音以进行回放的语音合成器 用于用户校正的合成语音和用于使用户能够校正语音的校正机制,使得校正的语音被提供为用于通过通信系统传送的文本。

    Lattice-based unsupervised maximum likelihood linear regression for speaker adaptation
    12.
    发明授权
    Lattice-based unsupervised maximum likelihood linear regression for speaker adaptation 有权
    用于说话者适应的基于格子的无监督最大似然线性回归

    公开(公告)号:US07216077B1

    公开(公告)日:2007-05-08

    申请号:US09670251

    申请日:2000-09-26

    IPC分类号: G10L15/06 G10L15/14

    CPC分类号: G10L15/065

    摘要: Methods and arrangements using lattice-based information for unsupervised speaker adaptation. By performing adaptation against a word lattice, correct models are more likely to be used in estimating a transform. Further, a particular type of lattice proposed herein enables the use of a natural confidence measure given by the posterior occupancy probability of a state, that is, the statistics of a particular state will be updated with the current frame only if the a posteriori probability of the state at that particular time is greater than a predetermined threshold.

    摘要翻译: 使用基于网格的信息进行无监督的演讲者适应的方法和安排。 通过对单词格进行调整,正确的模型更有可能用于估计变换。 此外,本文中提出的特定类型的晶格使得能够使用由状态的后占用概率给出的自然置信度度量,即,仅当前一帧的后验概率 该特定时间的状态大于预定阈值。

    Audio/video archive system and method for automatic indexing and searching
    13.
    发明授权
    Audio/video archive system and method for automatic indexing and searching 失效
    用于自动索引和搜索的音频/视频归档系统和方法

    公开(公告)号:US06603921B1

    公开(公告)日:2003-08-05

    申请号:US09108544

    申请日:1998-07-01

    IPC分类号: H04N704

    摘要: An archive system for records with an audio component, which uses automated speech recognition to create a multi-layered archive pyramid. The archive pyramid includes successive layers of data stored at varying data rates such as original video data, compressed video data, original audio, compressed audio data, recognized word-lattices, recognized word-bags and a global word index. The disclosed system uses automatic speech recognition to transcribe from audio to searchable index layers. During a search operation, automatic and semi-automatic techniques are used to search the archive pyramid from the smallest narrowest layers to the largest widest layers, to identify a moderate subset of records. This subset is further refined by a manual survey of regenerated compressed audio. Finally, the selected records are retrieved from the original audio archive layer.

    摘要翻译: 用于具有音频组件的记录的存档系统,其使用自动语音识别来创建多层档案金字塔。 存档金字塔包括以变化的数据速率存储的连续的数据层,例如原始视频数据,压缩视频数据,原始音频,压缩音频数据,识别的字格,识别的单词袋和全局词索引。 所公开的系统使用自动语音识别从音频转换为可搜索的索引层。 在搜索操作期间,使用自动和半自动技术从最小最窄层到最大最宽层搜索归档金字塔,以识别记录的中等子集。 通过对再生压缩音频的手动调查进一步改进了该子集。 最后,从原始音频档案层中检索选定的记录。

    Apparatus and method for performing model estimation utilizing a
discriminant measure
    14.
    发明授权
    Apparatus and method for performing model estimation utilizing a discriminant measure 失效
    使用判别式进行模型估计的装置和方法

    公开(公告)号:US5970239A

    公开(公告)日:1999-10-19

    申请号:US908120

    申请日:1997-08-11

    IPC分类号: G06F9/455

    摘要: Method for performing acoustic model estimation to optimize classification accuracy on speaker derived feature vectors with respect to a plurality of classes corresponding to phones to which a plurality of acoustic models respectively correspond comprises: (a) initializing an acoustic model for each phone; (b) evaluating the merit of the acoustic model initialized for each phone utilizing an objective function having a two component discriminant measure capable of characterizing each phone whereby a first component is defined as a probability that the model for the phone assigns to feature vectors from the phone and a second component is defined as a probability that the model for the phone assigns to feature vectors from other phones; (c) adapting the model for selected phones so as to increase the first component for the phone or decrease the second component for the phone, the adapting step yielding a new model for each selected phone; (d) evaluating the merit of the new models for each phone adapted in step (c) utilizing the two component measure; (e) comparing results of the evaluation of step (b) with results of the evaluation of step (d) for each phone, and if the first component has increased or the second component has decreased, the new model is kept for that phone, else the model originally initialized is kept; (f) estimating parameters associated with each model kept for each phone in order to optimize the function; and (g) evaluating termination criterion to determine if the parameters of the models are optimized.

    摘要翻译: 用于执行声学模型估计以优化关于与多个声学模型分别对应的电话相对应的多个类别的扬声器导出特征向量的分类精度的方法包括:(a)初始化每个电话的声学模型; (b)使用具有能够表征每个电话的双分量判别式度量的目标函数来评估对于每个电话初始化的声学模型的优点,由此第一分量被定义为电话模型分配来自所述电话的特征向量的概率 电话和第二组件被定义为电话模型从其他电话分配给特征向量的概率; (c)使所选择的手机的模型适配,以便增加电话的第一组件或减少电话的第二组件,适应步骤为每个所选择的电话产生新的模型; (d)利用两部分措施评估在步骤(c)中适应的每个电话的新模型的优点; (e)将步骤(b)的评价结果​​与每个电话的步骤(d)的评估结果进行比较,如果第一组分增加或第二组分减少,则为该电话保留新模型, 否则原始初始化的模型被保留; (f)估计与为每个电话保留的每个模型相关的参数,以优化功能; 和(g)评估终止标准以确定模型的参数是否被优化。

    Reduction of search space in speech recognition using phone boundaries
and phone ranking
    15.
    发明授权
    Reduction of search space in speech recognition using phone boundaries and phone ranking 失效
    使用手机边界和手机排名减少语音识别中的搜索空间

    公开(公告)号:US5729656A

    公开(公告)日:1998-03-17

    申请号:US347013

    申请日:1994-11-30

    摘要: A method for estimating the probability of phone boundaries and the accuracy of the acoustic modelling in reducing a search-space in a speech recognition system. The accuracy of the acoustic modelling is quantified by the rank of the correct phone. The system includes a microphone for converting an utterance into an electrical signal, which is processed by an acoustic processor and label match which finds the best-matched acoustic label prototype. A probability distribution on phone boundaries is produced for every time frame using a first decision tree. These probabilities are compared to a threshold and some time frames are identified as boundaries between phones. An acoustic score is computed for all phones between every given pair of hypothesized boundaries, and the phones are ranked on the basis of this score. A second decision tree is traversed for every time frame to obtain the worst case rank of the correct phone at that time, and a short list of allowed phones is made for every time frame. A fast acoustic word match processor matches the label string from the acoustic processor to produce an utterance signal which includes at least one word. From recognition candidates produced by the fast acoustic match and the language model, the detailed acoustic match matches the label string from the acoustic processor against acoustic word models and outputs a word string corresponding to an utterance.

    摘要翻译: 一种用于在减少语音识别系统中的搜索空间中估计电话边界的概率和声学建模的准确度的方法。 声学建模的准确度由正确的手机的等级来量化。 该系统包括用于将发音转换成电信号的麦克风,该电信号由声学处理器处理,并且标签匹配找到最佳匹配的声学标签原型。 使用第一决策树为每个时间帧产生电话边界上的概率分布。 将这些概率与阈值进行比较,并且将一些时间帧识别为电话之间的边界。 对于所有给定的一对假设边界之间的所有电话,计算声学得分,并且手机基于该分数进行排名。 每个时间帧都会遍历第二个决策树,以获得当时正确的电话的最差情况等级,并为每个时间帧制作一个简短的允许电话列表。 快速声学词匹配处理器将来自声学处理器的标签串匹配以产生包括至少一个单词的话语信号。 从快速声学匹配和语言模型产生的识别候选中,详细的声匹配将来自声学处理器的标签串与声学词模型相匹配,并输出与发音对应的字串。

    Method and apparatus for processing information signals based on content
    16.
    发明授权
    Method and apparatus for processing information signals based on content 有权
    基于内容处理信息信号的方法和装置

    公开(公告)号:US07092496B1

    公开(公告)日:2006-08-15

    申请号:US09664300

    申请日:2000-09-18

    IPC分类号: H04M1/652

    摘要: Methods and apparatus are provided for processing an information signal containing content presented in accordance with at least one modality. In one aspect of the present invention, a method of processing an information signal containing content presented in accordance with at least one modality, comprises the steps of: (i) obtaining the information signal; (ii) performing content detection on the information signal to detect whether the information signal includes particular content presented in accordance with the at least one modality; and (iii) generating a control signal, when the particular content is detected, for use in controlling a rendering property of the particular content and/or implementation of a specific action relating to the particular content. Various illustrative embodiments in the context of speech signal processing for use in voicemail and/or cellular phone applications are provided, as well as illustrative embodiments associated with the processing of multi-modal or multimedia information signals. Also, the present invention provides for storing selectively marked information, even in the absence of content detection, such that the information may be rendered and/or used at a later time. The invention also extends to processing of text-based and markup language-based signals, e.g., XML documents.

    摘要翻译: 提供了用于处理包含根据至少一种模态呈现的内容的信息信号的方法和装置。 在本发明的一个方面,一种处理包含根据至少一种模态呈现的内容的信息信号的方法包括以下步骤:(i)获得信息信号; (ii)对所述信息信号执行内容检测,以检测所述信息信号是否包括根据所述至少一种模式呈现的特定内容; 以及(iii)当检测到特定内容时,生成控制信号,以用于控制特定内容的呈现属性和/或与特定内容相关的特定动作的实现。 提供了在语音邮件和/或蜂窝电话应用中使用的语音信号处理的上下文中的各种说明性实施例,以及与多模式或多媒体信息信号的处理相关联的说明性实施例。 此外,本发明提供了即使在没有内容检测的情况下存储选择性标记的信息,使得可以在稍后时间呈现和/或使用该信息。 本发明还扩展到处理基于文本和标记语言的信号,例如XML文档。

    Determination and use of spectral peak information and incremental information in pattern recognition
    17.
    发明授权
    Determination and use of spectral peak information and incremental information in pattern recognition 失效
    光谱峰值信息和模式识别中的增量信息的确定和使用

    公开(公告)号:US06920424B2

    公开(公告)日:2005-07-19

    申请号:US09785605

    申请日:2001-02-16

    IPC分类号: G10L15/02 G10L25/90 G10L15/00

    CPC分类号: G10L25/90 G10L15/02 G10L25/15

    摘要: Generally, the present invention determines and uses spectral peak information, which preferably augments feature vectors and creates augmented feature vectors. The augmented feature vectors decrease errors in pattern recognition, increase noise immunity for wide-band noise, and reduce reliance on noisy formant features. Illustratively, one way of determining spectral peak information is to split pattern data into a number of frequency ranges and determine spectral peak information for each of the frequency ranges. This allows single peak selection. All of the spectral peak information is then used to augment a feature vector. Another way of determining spectral peak information is to use an adaptive Infinite Impulse Response filter to provide this information. Additionally, the present invention can determine and use incremental information. The incremental information is relatively easy to calculate and helps to determine if additional or changed features are worthwhile. The incremental information is preferably determined by determining a difference between mutual information (between the feature vector and the classes to be disambiguated) for new or changed feature vectors and mutual information for old feature vectors.

    摘要翻译: 通常,本发明确定并使用频谱峰值信息,其优选地增强特征向量并创建增强的特征向量。 增强的特征向量减少了模式识别中的误差,增加了宽带噪声的抗噪声能力,并减少了对噪声共振峰特征的依赖。 说明性地,确定频谱峰值信息的一种方法是将模式数据分割成多个频率范围,并确定每个频率范围的频谱峰值信息。 这允许单峰选择。 然后使用所有的光谱峰值信息来增加特征向量。 确定光谱峰值信息的另一种方法是使用自适应无限脉冲响应滤波器来提供该信息。 此外,本发明可以确定和使用增量信息。 增量信息相对容易计算,有助于确定附加或更改的功能是否值得。 增量信息优选地通过确定用于新的或改变的特征向量的相互信息(在特征向量和要被消歧的类别之间)之间的差异以及用于旧特征向量的相互信息来确定。

    Error corrective mechanisms for consensus decoding of speech
    18.
    发明授权
    Error corrective mechanisms for consensus decoding of speech 有权
    语音共识解码纠错机制错误

    公开(公告)号:US06859774B2

    公开(公告)日:2005-02-22

    申请号:US09847139

    申请日:2001-05-02

    IPC分类号: G10L15/08 G10L15/04

    CPC分类号: G10L15/08

    摘要: Techniques are described for decreasing the number of errors when consensus decoding is used during speech recognition. A number of corrective rules are applied to confusion sets that are extracted during real-time speech recognition. The corrective rules are determined during training of the speech recognition system, which entails using many training confusion sets. A learning process is used that generates a number of possible rules, called template rules, that can be applied to the training confusion sets. The learning process also determines the corrective rules from the template rules. The corrective rules operate on the real-time confusion sets to select hypothesis words from the confusion sets, where the hypothesis words are not necessarily the words having the highest score.

    摘要翻译: 描述了在语音识别期间使用共识解码时减少错误数量的技术。 一些纠正规则被应用于在实时语音识别期间提取的混淆集合。 纠正规则是在训练语音识别系统期间确定的,这需要使用许多训练混淆集。 使用一种学习过程,其生成可应用于训练混淆集合的许多可能的规则(称为模板规则)。 学习过程还根据模板规则确定校正规则。 纠正规则对实时混淆集进行操作,从混淆集中选择假设词,其中假设词不一定是具有最高分数的单词。

    Information extraction from documents with regular expression matching
    19.
    发明授权
    Information extraction from documents with regular expression matching 有权
    从具有正则表达式匹配的文档中提取信息

    公开(公告)号:US06842796B2

    公开(公告)日:2005-01-11

    申请号:US09898289

    申请日:2001-07-03

    IPC分类号: G06F17/27 G10L15/00 G06F3/00

    摘要: Techniques are provided for enumerating regularly identifiable or stereotypical phrases that people commonly use to convey particular information, and where exactly in these phrases the particular information is to be found. In one embodiment, such phrases are referred to as “regular expressions.” Using such enumerated phrases, the invention is able to automatically identify them in an input data stream and then identify and extract the particular information associated with the phrase that is being sought, e.g., important or relevant information.

    摘要翻译: 提供了用于列举人们通常用于传达特定信息的常规可识别或定型短语的技术,以及在这些短语中究竟在哪里找到特定信息。 在一个实施例中,这样的短语被称为“正则表达式”。 使用这样的列举的短语,本发明能够在输入数据流中自动识别它们,然后识别和提取与正在寻找的短语(例如重要或相关信息)相关联的特定信息。

    Methods and apparatus for performing heteroscedastic discriminant analysis in pattern recognition systems
    20.
    发明授权
    Methods and apparatus for performing heteroscedastic discriminant analysis in pattern recognition systems 失效
    在模式识别系统中执行异方差判别分析的方法和装置

    公开(公告)号:US06609093B1

    公开(公告)日:2003-08-19

    申请号:US09584871

    申请日:2000-06-01

    IPC分类号: G10L1508

    CPC分类号: G06K9/6234 G10L15/02

    摘要: The present invention provides a new approach to heteroscedastic linear discriminant analysis (HDA) by defining an objective function which maximizes the class discrimination in the projected subspace while ignoring the rejected dimensions. Moreover, we present a link between discrimination and the likelihood of the projected samples and show that HDA can be viewed as a constrained maximum likelihood (ML) projection for a full covariance gaussian model, the constraint being given by the maximization of the projected between-class scatter volume. The present invention also provides that, under diagonal covariance gaussian modeling constraints, applying a diagonalizing linear transformation (e.g., MLLT—maximum likelihood linear transformation) to the HDA space results in an increased classification accuracy. In another embodiment, the heteroscedastic discriminant objective function assumes that models associated with the function have diagonal covariances thereby resulting in a diagonal heteroscedastic discriminant objective function.

    摘要翻译: 本发明通过定义在忽略被拒绝维度的情况下最大化投影子空间中的类鉴别的目标函数来提供异方差线性判别分析(HDA)的新方法。 此外,我们提出了歧视与投影样本的可能性之间的联系,并且表明HDA可以被视为全协方差高斯模型的约束最大似然(ML)投影,该约束是由投影中的最大化给出的, 类散射体积。 本发明还提供,在对角协方差高斯建模约束下,将对角化线性变换(例如,MLLT-最大似然线性变换)应用于HDA空间导致提高的分类精度。 在另一个实施例中,异方差判别目标函数假定与函数相关联的模型具有对角协方差,从而导致对角异方差判别目标函数。