Apparatus and method for building domain-specific language models
    1.
    发明授权
    Apparatus and method for building domain-specific language models 有权
    用于构建领域特定语言模型的装置和方法

    公开(公告)号:US06188976B1

    公开(公告)日:2001-02-13

    申请号:US09178026

    申请日:1998-10-23

    IPC分类号: G06F1720

    CPC分类号: G10L15/183

    摘要: Disclosed is a method and apparatus for building a domain-specific language model for use in language processing applications, e.g., speech recognition. A reference language model is generated based on a relatively small seed corpus containing linguistic units relevant to the domain. An external corpus containing a large number of linguistic units is accessed. Using the reference language model, linguistic units which have a sufficient degree of relevance to the domain are extracted from the external corpus. The reference language model is then updated based on the seed corpus and the extracted linguistic units. The process may be repeated iteratively until the language model is of satisfactory quality. The language building technique may be further enhanced by combining it with mixture modeling or class-based modeling.

    摘要翻译: 公开了一种用于构建用于语言处理应用(例如,语音识别)中的域特定语言模型的方法和装置。 基于相对较小的种子语料库产生一个参考语言模型,该语料库包含与该域相关的语言单元。 访问包含大量语言单位的外部语料库。 使用参考语言模型,从外部语料库中提取与领域具有足够程度相关的语言单元。 然后基于种子语料库和提取的语言单位更新参考语言模型。 该过程可以重复地重复,直到语言模型具有令人满意的质量。 通过将其与混合建模或基于类的建模相结合,可以进一步增强语言构建技术。

    Speech recognizer having a speech coder for an acoustic match based on
context-dependent speech-transition acoustic models
    2.
    发明授权
    Speech recognizer having a speech coder for an acoustic match based on context-dependent speech-transition acoustic models 失效
    语音识别器具有基于上下文相关语音 - 过渡声学模型的用于声学匹配的语音编码器

    公开(公告)号:US5333236A

    公开(公告)日:1994-07-26

    申请号:US942862

    申请日:1992-09-10

    CPC分类号: G10L19/06

    摘要: A speech coding apparatus compares the closeness of the feature value of a feature vector signal of an utterance to the parameter values of prototype vector signals to obtain prototype match scores for the feature vector signal and each prototype vector signal. The speech coding apparatus stores a plurality of speech transition models representing speech transitions. At least one speech transition is represented by a plurality of different models. Each speech transition model has a plurality of model outputs, each comprising a prototype match score for a prototype vector signal. Each model output has an output probability. A model match score for a first feature vector signal and each speech transition model comprises the output probability for at least one prototype match score for the first feature vector signal and a prototype vector signal. A speech transition match score for the first feature vector signal and each speech transition comprises the best model match score for the first feature vector signal and all speech transition models representing the speech transition. The identification value of each speech transition and the speech transition match score for the first feature vector signal and each speech transition are output as a coded utterance representation signal of the first feature vector signal.

    摘要翻译: 语音编码装置将发声特征矢量信号的特征值与原型矢量信号的参数值的接近度进行比较,以获得特征向量信号和每个原型矢量信号的原型匹配分数。 语音编码装置存储表示语音转换的多个语音转换模型。 至少一个语音转换由多个不同的模型表示。 每个语音转换模型具有多个模型输出,每个模型输出包括原型矢量信号的原型匹配分数。 每个模型输出具有输出概率。 用于第一特征向量信号和每个语音转换模型的模型匹配分数包括用于第一特征向量信号和原型矢量信号的至少一个原型匹配分数的输出概率。 用于第一特征向量信号和每个语音转换的语音转换匹配分数包括用于第一特征向量信号的最佳模型匹配分数和表示语音转换的所有语音转换模型。 输出第一特征矢量信号和每个语音转换的每个语音转换的识别值和语音转换匹配分数作为第一特征向量信号的编码话音表示信号。

    Speech coding apparatus having speaker dependent prototypes generated
from nonuser reference data
    3.
    发明授权
    Speech coding apparatus having speaker dependent prototypes generated from nonuser reference data 失效
    具有由非用户参考数据生成的具有说话者依赖原型的语音编码装置

    公开(公告)号:US5278942A

    公开(公告)日:1994-01-11

    申请号:US802678

    申请日:1991-12-05

    CPC分类号: G10L15/063 G10L15/02

    摘要: A speech coding apparatus and method for use in a speech recognition apparatus and method. The value of at least one feature of an utterance is measured during each of a series of successive time intervals to produce a series of feature vector signals representing the feature values. A plurality of prototype vector signals, each having at least one parameter value and a unique identification value are stored. The closeness of the feature vector signal is compared to the parameter values of the prototype vector signals to obtain prototype match scores for the feature value signal and each prototype vector signal. The identification value of the prototype vector signal having the best prototype match score is output as a coded representation signal of the feature vector signal. Speaker-dependent prototype vector signals are generated from both synthesized training vector signals and measured training vector signals. The synthesized training vector signals are transformed reference feature vector signals representing the values of features of one or more utterances of one or more speakers in a reference set of speakers. The measured training feature vector signals represent the values of features of one or more utterances of a new speaker/user not in the reference set.

    摘要翻译: 一种用于语音识别装置和方法的语音编码装置和方法。 在一系列连续时间间隔的每一个期间测量话音的至少一个特征的值,以产生表示特征值的一系列特征向量信号。 存储多个具有至少一个参数值和唯一识别值的原型矢量信号。 将特征矢量信号的接近度与原型矢量信号的参数值进行比较,以获得特征值信号和每个原型矢量信号的原型匹配分数。 输出具有最佳原型匹配分数的原型矢量信号的识别值作为特征矢量信号的编码表示信号。 从合成的训练矢量信号和测量的训练矢量信号产生与扬声器相关的原型矢量信号。 合成的训练矢量信号是变换的参考特征矢量信号,其代表参考的一组扬声器中的一个或多个扬声器的一个或多个话音的特征值。 测量的训练特征向量信号表示不在参考集合中的新的说话者/用户的一个或多个话语的特征值。

    Speech coding apparatus and method for generating acoustic feature
vector component values by combining values of the same features for
multiple time intervals
    4.
    发明授权
    Speech coding apparatus and method for generating acoustic feature vector component values by combining values of the same features for multiple time intervals 失效
    用于通过组合多个时间间隔的相同特征的值来生成声学特征矢量分量值的语音编码装置和方法

    公开(公告)号:US5544277A

    公开(公告)日:1996-08-06

    申请号:US98682

    申请日:1993-07-28

    CPC分类号: G10L15/02 G10L15/20

    摘要: A speech coding apparatus and method measures the values of at least first and second different features of an utterance during each of a series of successive time intervals. For each time interval, a feature vector signal has a first component value equal to a first weighted combination of the values of only one feature of the utterance for at least two time intervals. The feature vector signal has a second component value equal to a second weighted combination, different from the first weighted combination, of the values of only one feature of the utterance for at least two time intervals. The resulting feature vector signals for a series of successive time intervals form a coded representation of the utterance. In one embodiment, a first weighted mixture signal has a value equal to a first weighted mixture of the values of the features of the utterance during a single time interval. A second weighted mixture signal has a value equal to a second weighted mixture, different from the first weighted mixture, of the values of the features of the utterance during a single time interval. The first component value of each feature vector signal is equal to a first weighted combination of the values of only the first weighted mixture signals for at least two time intervals, and the second component value of each feature vector signal is equal to a second weighted combination, different from the first weighted combination, of the values of only the second weighted mixture for at least two time intervals.

    摘要翻译: 语音编码装置和方法在一系列连续时间间隔的每一个期间测量话音的至少第一和第二不同特征的值。 对于每个时间间隔,特征向量信号具有等于至少两个时间间隔的仅一个特征的值的第一加权组合的第一分量值。 特征向量信号具有等于至少两个时间间隔的话语的一个特征的值的等于第一加权组合的第二加权组合的第二分量值。 所得到的一系列连续时间间隔的特征矢量信号形成话音的编码表示。 在一个实施例中,第一加权混合信号具有等于在单个时间间隔期间话音特征值的第一加权混合的值。 第二加权混合信号具有等于在单个时间间隔期间话音特征的值的与第一加权混合不同的第二加权混合的值。 每个特征向量信号的第一分量值等于至少两个时间间隔的仅第一加权混合信号的值的第一加权组合,并且每个特征向量信号的第二分量值等于第二加权组合 与第一加权组合不同的是仅至少两个时间间隔的第二加权混合值的值。

    System and method for providing network coordinated conversational services
    5.
    发明授权
    System and method for providing network coordinated conversational services 有权
    提供网络协调会话服务的系统和方法

    公开(公告)号:US08868425B2

    公开(公告)日:2014-10-21

    申请号:US13610221

    申请日:2012-09-11

    摘要: A system and method for providing automatic and coordinated sharing of conversational resources, e.g., functions and arguments, between network-connected servers and devices and their corresponding applications. In one aspect, a system for providing automatic and coordinated sharing of conversational resources includes a network having a first and second network device, the first and second network device each comprising a set of conversational resources, a dialog manager for managing a conversation and executing calls requesting a conversational service, and a communication stack for communicating messages over the network using conversational protocols, wherein the conversational protocols establish coordinated network communication between the dialog managers of the first and second network device to automatically share the set of conversational resources of the first and second network device, when necessary, to perform their respective requested conversational service.

    摘要翻译: 一种用于在网络连接的服务器和设备及其对应的应用之间提供会话资源(例如功能和参数)的自动和协调共享的系统和方法。 一方面,一种用于提供会话资源的自动和协调共享的系统包括具有第一和第二网络设备的网络,每个包括一组会话资源的第一和第二网络设备,用于管理对话和执行呼叫的对话管理器 请求会话服务,以及用于通过会话协议通过网络传送消息的通信栈,其中会话协议在第一和第二网络设备的对话管理器之间建立协调的网络通信,以自动共享第一和第二网络设备的会话资源集合 第二网络设备,在必要时执行其各自所请求的对话服务。

    Method and apparatus for estimating phone class probabilities
a-posteriori using a decision tree
    6.
    发明授权
    Method and apparatus for estimating phone class probabilities a-posteriori using a decision tree 失效
    用于使用决策树估计电话类概率的方法和装置

    公开(公告)号:US5680509A

    公开(公告)日:1997-10-21

    申请号:US312584

    申请日:1994-09-27

    IPC分类号: G10L15/06 G10L15/08 G10L5/06

    CPC分类号: G10L15/063 G10L15/08

    摘要: A method and apparatus for estimating the probability of phones, a-posteriori, in the context of not only the acoustic feature at that time, but also the acoustic features in the vicinity of the current time, and its use in cutting down the search-space in a speech recognition system. The method constructs and uses a decision tree, with the predictors of the decision tree being the vector-quantized acoustic feature vectors at the current time, and in the vicinity of the current time. The process starts with an enumeration of all (predictor, class) events in the training data at the root node, and successively partitions the data at a node according to the most informative split at that node. An iterative algorithm is used to design the binary partitioning. After the construction of the tree is completed, the probability distribution of the predicted class is stored at all of its terminal leaves. The decision tree is used during the decoding process by tracing a path down to one of its leaves, based on the answers to binary questions about the vector-quantized acoustic feature vector at the current time and its vicinity.

    摘要翻译: 在不仅在当时的声学特征以及当前时间附近的声学特征的上下文中估计电话的概率的方法和装置,以及其用于减少搜索 - 语音识别系统中的空间。 该方法构造并使用决策树,其中决策树的预测变量是当前时间和当前时间附近的矢量量化的声学特征向量。 该过程从在根节点的训练数据中的所有(预测器,类)事件的枚举开始,并且根据该节点处的最多信息拆分在节点处依次划分数据。 迭代算法用于设计二进制分区。 树完成后,预测类的概率分布存储在其所有终端叶上。 基于对当前时间及其附近的向量量化声学特征向量的二进制问题的答案,在解码过程中使用决策树通过跟踪到其叶子之一的路径。

    Apparatus and method of grouping utterances of a phoneme into
context-dependent categories based on sound-similarity for automatic
speech recognition
    7.
    发明授权
    Apparatus and method of grouping utterances of a phoneme into context-dependent categories based on sound-similarity for automatic speech recognition 失效
    基于自动语音识别的声音相似性将音素的语音分组成上下文相关类别的装置和方法

    公开(公告)号:US5195167A

    公开(公告)日:1993-03-16

    申请号:US871600

    申请日:1992-04-17

    CPC分类号: G10L15/063

    摘要: Symbol feature values and contextual feature values of each event in a training set of events are measured. At least two pairs of complementary subsets of observed events are selected. In each pair of complementary subsets of observed events, one subset has contextual features with values in a set C.sub.n, and the other set has contextual features with values in a set C.sub.n, were the sets in C.sub.n and C.sub.n are complementary sets of contextual feature values. For each subset of observed events, the similarity values of the symbol features of the observed events in the subsets are calculated. For each pair of complementary sets of observed events, a "goodness of fit" is the sum of the symbol feature value similarity of the subsets. The sets of contextual feature values associated with the subsets of observed events having the best "goodness of fit" are identified and form context-dependent bases for grouping the observed events into two output sets.

    摘要翻译: 测量训练集中的每个事件的符号特征值和上下文特征值。 选择观察事件的至少两对互补子集。 在观察事件的每对互补子集中,一个子集具有集合C n中的值的上下文特征,另一个集合具有集合Cn中的值的上下文特征,Cn和Cn中的集合是上下文特征值的互补集合 。 对于观察事件的每个子集,计算子集中观察事件的符号特征的相似度值。 对于每对观察事件的互补集合,“拟合优度”是子集的符号特征值相似度的总和。 识别与具有最佳“拟合优度”的观察事件的子集相关联的上下文特征值集合,并形成用于将观察到的事件分组为两个输出集合的上下文相关基础。

    Methods and apparatus for restricting access of a user using random partial biometrics
    9.
    发明授权
    Methods and apparatus for restricting access of a user using random partial biometrics 有权
    用于使用随机部分生物特征限制用户访问的方法和装置

    公开(公告)号:US06735695B1

    公开(公告)日:2004-05-11

    申请号:US09467581

    申请日:1999-12-20

    IPC分类号: H04L932

    摘要: A biometrics security method and apparatus are disclosed that restrict the ability of a user to access a device or facility using a portion of biometric data to validate the user's identity. Upon a user request to access a secure device or facility, the central biometric security system initially sends a first request for a specific sample of a portion of the user's biometric information. The specific sample may be identified, for example, using a set of image coordinates. A second request is also sent to retrieve the biometric prototype from a database of registered users. The central biometric security system then compares the user biometrics portion with the corresponding biometrics prototype portions. The user receives access to the requested device if the user biometrics portion(s) matches the corresponding biometrics prototype portions. In one variation, the biometric security system transmits a security agent to the user's computing device upon a user request to access a remote device. The security agent serves to extract user biometric portions in accordance with the sampling request from the central biometric security system. In another variation, a local recognition is performed before a remote recognition to reduce the risk of a failed server side recognition due to a poor biometrics feature.

    摘要翻译: 公开了一种生物识别安全方法和装置,其限制用户使用生物特征数据的一部分访问设备或设施以验证用户身份的能力。 在用户访问安全设备或设施的请求时,中央生物特征安全系统最初向用户生物特征信息的一部分的特定样本发送第一请求。 可以例如使用一组图像坐标来识别特定样品。 还发送第二个请求以从注册用户的数据库检索生物特征原型。 中央生物识别安全系统然后将用户生物识别部分与相应的生物特征数据原型部分进行比较。 如果用户生物测定部分匹配相应的生物测定原型部分,则用户接收对所请求的设备的访问。 在一个实施例中,生物特征安全系统在用户请求访问远程设备时将安全代理传送到用户的计算设备。 安全代理用于根据来自中央生物特征安全系统的采样请求提取用户生物特征部分。 在另一个实施例中,在远程识别之前执行本地识别,以减少由于生物特征差的特征导致服务器端识别失败的风险。

    Context-dependent speech recognizer using estimated next word context
    10.
    发明授权
    Context-dependent speech recognizer using estimated next word context 失效
    使用估计下一个单词上下文的上下文相关语音识别器

    公开(公告)号:US5233681A

    公开(公告)日:1993-08-03

    申请号:US874271

    申请日:1992-04-24

    IPC分类号: G10L15/10 G10L15/18 G10L15/28

    CPC分类号: G10L15/19 G10L15/193

    摘要: A speech recognition apparatus and method estimates the next word context for each current candidate word in a speech hypothesis. An initial model of each speech hypothesis comprises a model of a partial hypothesis of zero or more words followed by a model of a candidate word. An initial hypothesis score for each speech hypothesis comprises an estimate of the closeness of a match between the initial model of the speech hypothesis and a sequence of coded representations of the utterance. The speech hypotheses having the best initial hypothesis scores form an initial subset. For each speech hypothesis in the initial subset, the word which is most likely to follow the speech hypothesis is estimated. A revised model of each speech hypothesis in the initial subset comprises a model of the partial hypothesis followed by a revised model of the candidate word. The revised candidate word model is dependent at least on the word which is estimated to be most likely to follow the speech hypothesis. A revised hypothesis score for each speech hypothesis in the initial subset comprises an estimate of the closeness of a match between the revised model of the speech hypothesis and the sequence of coded representations of the utterance. The speech hypotheses from the initial subset which have the best revised match scores are stored as a reduced subset. At least one word of one or more of the speech hypotheses in the reduced subset is output as a speech recognition result.