Speech recognition
    2.
    发明授权
    Speech recognition 失效
    语音识别

    公开(公告)号:US4956865A

    公开(公告)日:1990-09-11

    申请号:US191824

    申请日:1988-05-02

    IPC分类号: G10L11/02 G10L15/02

    CPC分类号: G10L25/87 G10L15/02

    摘要: In a speech recognizer, for recognizing unknown utterances in isolated-word speech or continuous speech, improved recognition accuracy is obtained by augmenting the usual spectral representation of the unknown utterance with a dynamic component. A corresponding dynamic component is provided in the templates with which the spectral representation of the utterance is compared. In preferred embodiments, the representation is mel-based cepstral and the dynamic components comprise vector differences between pairs of primary cepstra. Preferably the time interval between each pair is about 50 milliseconds. It is also preferable to compute a dynamic perceptual loudness component along with the dynamic parameters.

    摘要翻译: 在语音识别器中,为了识别孤立词语音或连续语音中的未知语音,通过用动态分量增加未知语音的常规频谱表示来获得改进的识别精度。 在模板中提供相应的动态分量,与之对比发音的频谱表示。 在优选实施例中,该表示是基于mel的倒频谱,并且动态分量包括主要cepstra对之间的矢量差异。 优选地,每对之间的时间间隔约为50毫秒。 还优选地计算动态感知响度分量以及动态参数。

    Prosody based endpoint detection
    3.
    发明授权
    Prosody based endpoint detection 失效
    基于韵律的终点检测

    公开(公告)号:US06873953B1

    公开(公告)日:2005-03-29

    申请号:US09576116

    申请日:2000-05-22

    申请人: Matthew Lennig

    发明人: Matthew Lennig

    IPC分类号: G10L11/02 G10L15/04

    CPC分类号: G10L25/87

    摘要: A method and apparatus are provided for performing prosody based endpoint detection of speech in a speech recognition system. Input speech represents an utterance, which has an intonation pattern. An end-of-utterance condition is identified based on prosodic parameters of the utterance, such as the intonation pattern and the duration of the final syllable of the utterance, as well as non-prosodic parameters, such as the log energy of the speech.

    摘要翻译: 提供了一种用于在语音识别系统中执行基于韵律的终端检测语音的方法和装置。 输入语言表示一个具有语调模式的话语。 基于话语的韵律参数,例如语调模式和话语最后一个音节的持续时间以及非韵律参数(如语音的对数能量)来确定语音终止条件。

    System architecture for and method of voice processing
    4.
    发明授权
    System architecture for and method of voice processing 失效
    系统架构和语音处理方法

    公开(公告)号:US06119087A

    公开(公告)日:2000-09-12

    申请号:US39203

    申请日:1998-03-13

    IPC分类号: G10L15/22 G10L15/30 G10L15/14

    CPC分类号: G10L15/30 G10L15/22

    摘要: A system and method for efficiently distributing voice call data received from speech recognition servers over a telephone network having a shared processing resource is disclosed. Incoming calls are received from phone lines and assigned grammar types by speech recognition servers. A request for processing the voice call data is sent to a resource manager which monitors the shared processing resource and identifies a preferred processor within the shared resource. The resource manager sends an instruction to the speech recognition server to send the voice call data to a preferred processor for processing. The preferred processor is determined by known processor efficiencies for voice call data having the assigned grammar type of the incoming voice call data and a measure of processor loads. While the system is operating, the resource manger develops and updates a history of each processor. The histories include processing efficiency values for all grammar types received. The processing efficiencies are stored, tabulated and assigned usage number values for each processor. When incoming voice call data is receive, the resource manages evaluates the total sum of the usage numbers for processing requests assigned to each processor and the usage number for the grammar type of the incoming data as applied to each processor. The incoming data is distributed to the processor with the lowest sum of total of usage numbers for assigned requests and the usage number assigned to the incoming data for that processor.

    摘要翻译: 公开了一种用于通过具有共享处理资源的电话网络有效地分发从语音识别服务器接收的语音呼叫数据的系统和方法。 通过语音识别服务器从电话线接收来电和分配的语法类型。 处理语音呼叫数据的请求被发送到资源管理器,该资源管理器监视共享的处理资源并识别共享资源内的优选处理器。 资源管理器向语音识别服务器发送指令以将语音呼叫数据发送到优选处理器进行处理。 优选的处理器由具有所分配的语音呼叫数据语法类型和处理器负载量度的语音呼叫数据的已知处理器效率来确定。 当系统运行时,资源管理器开发和更新每个处理器的历史记录。 历史包括收到的所有语法类型的处理效率值。 处理效率被存储,制表并为每个处理器分配使用编号值。 当接收到接收的语音呼叫数据时,资源管理评估用于处理分配给每个处理器的请求的使用数量的总和以及应用于每个处理器的输入数据的语法类型的使用次数。 输入数据以分配的请求的总使用数量总和和分配给该处理器的输入数据的使用编号分配给处理器。

    Rejection method for speech recognition
    5.
    发明授权
    Rejection method for speech recognition 失效
    拒绝语音识别方法

    公开(公告)号:US5097509A

    公开(公告)日:1992-03-17

    申请号:US501993

    申请日:1990-03-28

    申请人: Matthew Lennig

    发明人: Matthew Lennig

    IPC分类号: G10L15/00 G10L15/08

    CPC分类号: G10L15/08

    摘要: A speech recognizer, for recognizing unknown utterances in isolated-word small-vocabulary speech has improved rejection of out of vocabulary utterances. Both a usual spectral representation including a dynamic component and an equalized representation are used to match unknown utterances to templates for in-vocabulary words. In a preferred embodiment, the representations are mel-based cepstral with dynamic components being signed vector differences between pairs of primary cepstra. The equalized representation being the signed difference of each cepstral coefficient less an average value of the coefficients. Factors are generated from the ordered lists of templates to determine the probability of the top choice being a correct acceptance, with different methods being applied when the usual and equalized representations yield a different match. For additional enhancement, the rejection method may use templates corresponding to non-vocabulary utterances or decoys. If the top choice corresponds to a decoy, the input is rejected.

    摘要翻译: 用于识别孤立词小词汇语音中的未知话语的语音识别器改进了对词汇话语的拒绝。 包括动态分量和均衡表示的常规频谱表示都用于将未知语音与词汇词的模板相匹配。 在优选实施例中,这些表示是基于mel的倒频谱,其中动态分量是主要cepstra对之间的符号矢量差异。 均衡表示是每个倒谱系数的有符号差小于系数的平均值。 因素是从有序的模板列表生成的,以确定最佳选择是正确接受的概率,当平常和均衡的表示产生不同的匹配时,应用不同的方法。 为了进一步增强,拒绝方法可以使用对应于非词汇话语或诱饵的模板。 如果顶级选择对应于诱饵,则输入被拒绝。

    Spoken language proficiency assessment by computer
    6.
    发明申请
    Spoken language proficiency assessment by computer 审中-公开
    电脑口语能力评估

    公开(公告)号:US20070033017A1

    公开(公告)日:2007-02-08

    申请号:US11490290

    申请日:2006-07-20

    IPC分类号: G10L19/00

    摘要: A system and method for spoken language proficiency assessment by a computer is described. A user provides a spoken response to a constructed response question. A speech recognition system processes the spoken response into a sequence of linguistic units. At training time, features matching a linguistic template are extracted by identifying matches between a training sequence of linguistic units and pre-selected templates. Additionally, a generalized count of the extracted features is computed. At runtime, linguistic features are detected by comparing a runtime sequence of linguistic units to the feature set extracted at training time. This comparison results in a generalized count of linguistic features. The generalized count is then used to compute a score.

    摘要翻译: 描述了由计算机进行口语能力评估的系统和方法。 用户提供对构建的响应问题的口头响应。 语音识别系统将语音响应处理成一系列语言单元。 在训练时间,通过识别语言单元的训练序列和预选模板之间的匹配来提取与语言模板匹配的特征。 另外,计算提取的特征的广义计数。 在运行时,通过将语言单元的运行时序列与在训练时提取的特征集进行比较来检测语言特征。 这种比较导致了语言特征的广义计数。 然后将广义计数用于计算分数。

    Distributed voice web architecture and associated components and methods
    7.
    发明授权
    Distributed voice web architecture and associated components and methods 有权
    分布式语音Web架构及相关组件和方法

    公开(公告)号:US06785653B1

    公开(公告)日:2004-08-31

    申请号:US09561680

    申请日:2000-05-01

    IPC分类号: G10L2100

    摘要: A speech-enabled distributed processing system forming a Voice Web includes a gateway, one or more voice content sites coupled to the gateway over a wide area network, and a browser coupled to the gateway over a network, which may or may not be the wide area network. The gateway receives telephone calls from one or more users over telephony connections and performs endpointing of speech of each user. The browser provides the gateway with information enabling the gateway to selectively direct the endpointed speech to a voice content site via the wide area network. The gateway outputs the endpointed speech in the form of application protocol requests onto the wide area network to the appropriate site, as specified by the browser, or to the browser. The gateway receives prompts in the form of application protocol responses from the browser or a voice content site and plays the prompts to the appropriate user over the telephony connection. While accessing a selected voice content site, the gateway reroutes the endpointed speech to the browser if the endpointing result represents a hotword candidate.

    摘要翻译: 形成语音网络的支持语音的分布式处理系统包括网关,通过广域网耦合到网关的一个或多个语音内容站点,以及通过网络耦合到网关的浏览器,其可以是或不是宽的 区域网络。 网关通过电话连接从一个或多个用户接收电话呼叫,并执行每个用户的语音终止。 浏览器向网关提供信息,使得网关能够通过广域网选择性地将终端语音指向语音内容站点。 网关将应用协议请求形式的端点语音输出到广域网到由浏览器或浏览器指定的适当站点。 网关以来自浏览器或语音内容站点的应用协议响应的形式接收提示,并通过电话连接向适当的用户播放提示。 在访问所选择的语音内容站点时,如果终结点结果表示热门候选者,则网关将端点语音重新路由到浏览器。

    Speech recognition method using a two-pass search
    8.
    发明授权
    Speech recognition method using a two-pass search 失效
    使用双向搜索的语音识别方法

    公开(公告)号:US5515475A

    公开(公告)日:1996-05-07

    申请号:US80543

    申请日:1993-06-24

    CPC分类号: G10L15/08 G10L15/142

    摘要: A method of recognizing speech comprises searching a vocabulary of words for a match to an unknown utterance. Words in the vocabulary are represented by concatenated allophone models and the vocabulary is represented as a network. On a first pass of the search, a one-state duration constrained model is used to search the vocabulary network. The one-state model has as its transition probability the maximum observed transitional probability (model distance) of the unknown utterance for the corresponding allophone model. Words having top scores are chosen from the first pass search and, in a second pass of the search, rescored using a full Viterbi trellis with the complete allophone models and model distances. The rescores are sorted to provide a few top choices. Using a second set of speech parameters these few top choices are again rescored. Comparison of the scores using each set of speech parameters determines a recognition choice. Post processing is also possible to further enhance recognition accuracy. Test results indicate that the two-pass search provides approximately the same recognition accuracy as a full Viterbi search of the vocabulary network.

    摘要翻译: 识别语音的方法包括搜索单词的词汇以用于未知语音的匹配。 词汇中的词汇由连接的异音素模型表示,词汇表示为网络。 在搜索的第一遍,使用一状态持续时间约束模型来搜索词汇网络。 单态模型具有对于相应异音素模型的未知话语的最大观察到的过渡概率(模型距离)作为其转换概率。 从第一遍搜索中选择具有最高分数的单词,并且在第二遍搜索中,使用完整的维特比网格与完整的异音素模型和模型距离进行重播。 分类被分类以提供几个最佳选择。 使用第二组语音参数,这些几个顶级选择再次被重新打破。 使用每组语音参数对比分数决定了识别选择。 后处理也可以进一步提高识别精度。 测试结果表明,双向搜索提供与词汇网络的完整维特比搜索大致相同的识别精度。

    Method and apparatus for automation of directory assistance using speech
recognition
    10.
    发明授权
    Method and apparatus for automation of directory assistance using speech recognition 失效
    使用语音识别自动化目录辅助的方法和装置

    公开(公告)号:US5479488A

    公开(公告)日:1995-12-26

    申请号:US193522

    申请日:1994-02-08

    IPC分类号: H04M3/493 H04M3/64 G10L9/06

    CPC分类号: H04M3/4931 H04M2201/40

    摘要: In a telecommunications system, automatic directory assistance uses a voice processing unit comprising a lexicon of lexemes and data representing a predetermined relationship between each lexeme and calling numbers in a locality served by the automated directory assistance apparatus. The voice processing unit issues messages to a caller making a directory assistance call to prompt the caller to utter a required one of said lexemes. The unit detects the calling number originating a directory assistance call and, responsive to the calling number and the relationship data computes a probability index representing the likelihood of a lexeme being the subject of the directory assistance call. The unit employs a speech recognizer to recognize, on the basis of the acoustics of the caller's utterance and the probability index, a lexeme corresponding to that uttered by the caller.

    摘要翻译: 在电信系统中,自动目录援助使用语音处理单元,该语音处理单元包括词汇词典和代表由自动目录帮助装置服务的地点中的每个词汇和主叫号码之间的预定关系的数据。 语音处理单元向呼叫者发出消息,进行目录协助呼叫以提示呼叫者发出所需的一个所述词汇。 该单元检测发起目录辅助呼叫的呼叫号码,并且响应于主叫号码,并且关系数据计算表示作为目录协助呼叫的对象的词汇的可能性的概率指标。 该单元采用语音识别器,根据呼叫者的话语和概率索引的声学来识别与由呼叫者发出的语音对应的词典。