System and method for speech recognition
    1.
    发明申请
    System and method for speech recognition 审中-公开
    用于语音识别的系统和方法

    公开(公告)号:US20040260546A1

    公开(公告)日:2004-12-23

    申请号:US10830458

    申请日:2004-04-23

    IPC分类号: G10L015/00

    CPC分类号: G10L15/20

    摘要: A system and method include an initial noise model produced based on pre-estimated noise of a service environment and an initial synthesized model of a voice containing noise. The system and method produce an utterance environment noise model from background noise of the service environment upon speech recognition as well as a sequence of feature vectors from noise-superimposed speech including an uttered voice and the background noise. The system and method also produce an adaptive model by adapting the initial synthesized model using the utterance environment noise model, the initial noise model, and a compensation model, so that the adaptive model is checked against the sequence of feature vectors to perform speech recognition. Upon performing the speech recognition, a compensation model is created upon which the signal to noise ratio between the background noise present at the time of actual utterance of a voice and the uttered voice is reflected.

    摘要翻译: 系统和方法包括基于服务环境的预估噪声和包含噪声的语音的初始合成模型产生的初始噪声模型。 该系统和方法从语音识别中的服务环境的背景噪声以及包括发出的语音和背景噪声的噪声叠加语音的特征向量序列产生语音环境噪声模型。 该系统和方法还通过使用发声环境噪声模型,初始噪声模型和补偿模型来适应初始合成模型来产生自适应模型,从而针对特征向量序列检查自适应模型以执行语音识别。 在执行语音识别时,产生补偿模型,在该补偿模型上反映在语音实际发声时出现的背景噪声与发出的声音之间的信噪比。

    Method for integrating processes with a multi-faceted human centered interface
    2.
    发明申请
    Method for integrating processes with a multi-faceted human centered interface 有权
    将过程与多方面的人机对话界面相集成的方法

    公开(公告)号:US20040249640A1

    公开(公告)日:2004-12-09

    申请号:US10619204

    申请日:2003-07-14

    IPC分类号: G10L015/00

    CPC分类号: G10L15/193 G10L2015/228

    摘要: According to the present invention, a method for integrating processes with a multi-faceted human centered interface is provided. The interface is facilitated to implement a hands free, voice driven environment to control processes and applications. A natural language model is used to parse voice initiated commands and data, and to route those voice initiated inputs to the required applications or processes. The use of an intelligent context based parser allows the system to intelligently determine what processes are required to complete a task which is initiated using natural language. A single window environment provides an interface which is comfortable to the user by preventing the occurrence of distracting windows from appearing. The single window has a plurality of facets which allow distinct viewing areas. Each facet has an independent process routing its outputs thereto. As other processes are activated, each facet can reshape itself to bring a new process into one of the viewing areas. All activated processes are executed simultaneously to provide true multitasking.

    摘要翻译: 根据本发明,提供了一种用于将过程与多方面的人对中接口进行集成的方法。 该接口有助于实现免提,语音驱动的环境来控制过程和应用程序。 自然语言模型用于解析语音发起的命令和数据,并将这些语音发起的输入路由到所需的应用程序或进程。 使用基于智能上下文的解析器允许系统智能地确定完成使用自然语言启动的任务所需的进程。 单个窗口环境通过防止分散窗口出现而提供对用户舒适的界面。 单个窗口具有允许不同观看区域的多个小平面。 每个方面都有一个独立的过程来路由其输出。 随着其他过程被激活,每个方面都可以重塑自己,以将新的过程带入其中一个观看区域。 所有激活的进程都被同时执行,以提供真正的多任务。

    Detecting repeated phrases and inference of dialogue models
    3.
    发明申请
    Detecting repeated phrases and inference of dialogue models 审中-公开
    检测反复的短语和对话模型的推论

    公开(公告)号:US20040249637A1

    公开(公告)日:2004-12-09

    申请号:US10857896

    申请日:2004-06-02

    申请人: Aurilab, LLC

    发明人: James K. Baker

    IPC分类号: G10L015/00

    CPC分类号: G10L15/1822 G10L15/1815

    摘要: A method of speech recognition obtains acoustic data from a plurality of conversations. A plurality of pairs of utterances are selected from the plurality of conversations. At least one portion of the first utterance of the pair of utterances is dynamically aligned with at least one portion of the second utterance of the pair of utterance, and an acoustic similarity is computed. At least one pair that includes a first portion from a first utterance and a second portion from a second utterance is chosen, based on a criterion of acoustic similarity. A common pattern template is created from the first portion and the second portion.

    摘要翻译: 一种语音识别方法从多个对话中获得声学数据。 从多个会话中选择多对话语。 一对话语的第一个发音的至少一部分与该对发音的第二个发音的至少一部分动态对齐,并且计算声学相似度。 基于声学相似性的标准,选择至少一对包括来自第一话语的第一部分和来自第二话语的第二部分。 从第一部分和第二部分创建共同的图案模板。

    Automatic assessment of phonological processes
    4.
    发明申请
    Automatic assessment of phonological processes 有权
    自动评估语音过程

    公开(公告)号:US20040230430A1

    公开(公告)日:2004-11-18

    申请号:US10637235

    申请日:2003-08-08

    IPC分类号: G10L015/00

    CPC分类号: G09B19/06 G10L15/02

    摘要: A computer-based system generates alternative phonetic transcriptions for a target word or phrase corresponding to specific phonological processes that replace individual phonemes or clusters of two or more phonemes with replacement phonemes. The system compares a user's speech with a list of possible transcriptions that includes the base (i.e., correct) transcription of the test target as well as the different alternative transcriptions, to identify the transcription that best matches the user's. In a speech therapy application, the system identifies the phonological process(es), if any, associated with the user's speech and generates statistics over multiple test targets that can be used to diagnose the user's specific phonological disorders. The system can also be implemented in other contexts such as foreign language instruction and automated attendant applications to cover a wide variety and range of accents and/or phonological disorders.

    摘要翻译: 基于计算机的系统产生用于替换具有替换音素的两个或多个音素的单个音素或簇的特定语音过程的目标词或短语的替代语音转录。 该系统将用户的语音与包括测试目标的基础(即,正确)转录以及不同的替代转录的可能转录的列表进行比较,以识别与用户最匹配的转录。 在语音治疗应用中,系统识别与用户语音相关联的语音过程(如果有的话),并产生可用于诊断用户的特定语音障碍的多个测试目标的统计。 该系统还可以在诸如外语指令和自动应答之类的其他情况下实现,以覆盖广泛的各种各样的口音和/或语音障碍。

    Speaker recognition using local models
    5.
    发明申请
    Speaker recognition using local models 有权
    扬声器识别使用本地模型

    公开(公告)号:US20040225498A1

    公开(公告)日:2004-11-11

    申请号:US10810232

    申请日:2004-03-26

    发明人: Ryan Rifkin

    IPC分类号: G10L015/00

    CPC分类号: G10L17/02 G10L17/08

    摘要: A system and method for voice recognition is disclosed. The system enrolls speakers using an enrollment voice samples and identification information. An extraction module characterizes enrollment voice samples with high-dimensional feature vectors or speaker data points. A data structuring module organizes data points into a high-dimensional data structure, such as a kd-tree, in which similarity between data points dictates a distance, such as a Euclidean distance, a Minkowski distance, or a Manhattan distance. The system recognizes a speaker using an unidentified voice sample. A data querying module searches the data structure to generate a subset of approximate nearest neighbors based on an extracted high-dimensional feature vector. A data modeling module uses Parzen windows to estimate a probability density function representing how closely characteristics of the unidentified speaker match enrolled speakers, in real-time, without extensive training data or parametric assumptions about data distribution. A smoothing parameter controls the relative contributions of close and far speaker data points to the estimated density.

    摘要翻译: 公开了一种用于语音识别的系统和方法。 系统使用注册语音样本和身份信息注册演讲者。 提取模块表征具有高维特征向量或扬声器数据点的注册语音样本。 数据结构化模块将数据点组织成诸如kd-tree的高维数据结构,其中数据点之间的相似性指示距离,例如欧几里德距离,闵可夫斯基距离或曼哈顿距离。 该系统识别使用不明身份的语音样本的扬声器。 数据查询模块基于提取的高维特征向量来搜索数据结构以生成近似最近邻的子集。 数据建模模块使用Parzen窗口来估​​计概率密度函数,表示不确定的说话人的特征与已登记的演讲者的特征密切相关,无需广泛的训练数据或关于数据分布的参数假设。 平滑参数控制近距离和远扬声器数据点对估计密度的相对贡献。

    Voice recognition/response system, voice recognition/response program and recording medium for same
    6.
    发明申请
    Voice recognition/response system, voice recognition/response program and recording medium for same 审中-公开
    语音识别/响应系统,语音识别/响应程序和记录介质

    公开(公告)号:US20040220808A1

    公开(公告)日:2004-11-04

    申请号:US10609641

    申请日:2003-07-01

    IPC分类号: G10L015/00

    CPC分类号: G10L15/22 G10L2015/228

    摘要: A voice recognition/response system comprising an utterance recognition unit, a dialog control processing unit, an utterance feature analyzing unit and a response voice generating unit. The utterance recognition unit recognizes utterance content of a user through a voice input therefrom and outputs recognition results. The dialog control processing unit controls progress of dialog with the user based on the recognition results so as to determine response content to the user. The utterance feature analyzing unit analyzes utterance features of the user to generate utterance feature information. The response voice generating unit generates response voice to the user based on the response content and the utterance feature information.

    摘要翻译: 语音识别/响应系统,包括话音识别单元,对话控制处理单元,话音特征分析单元和应答语音生成单元。 话音识别单元通过从其输入的语音来识别用户的话语内容,并输出识别结果。 对话控制处理单元基于识别结果来控制与用户对话的进度,以便确定对用户的响应内容。 话音特征分析单元分析用户的话语特征以产生语音特征信息。 响应语音生成单元基于响应内容和话语特征信息,向用户生成响应语音。

    Sonic/ultrasonic authentication device
    7.
    发明申请
    Sonic/ultrasonic authentication device 有权
    声波/超声波认证装置

    公开(公告)号:US20040220807A9

    公开(公告)日:2004-11-04

    申请号:US09853017

    申请日:2001-05-10

    IPC分类号: G10L017/00 G10L015/00

    摘要: A method for verifying and identifying users, and for verifying users' identity, by means of an authentication device capable of transmitting, receiving and recording audio or ultrasonic signals, and capable of converting the signals into digital data, and performing digital signal processing. Voice pattern(s) and user(s) information of one or more authorized user(s) are recorded and stored on the authentication device. User(s) identification is verified by inputting to the authentication device a vocal identification signal from a user, and comparing the voice pattern of the vocal identification signal with the recorded voice pattern(s) of the authorized user(s), and if a match is detected issuing an indication that the user is identified as an authorized user.

    摘要翻译: 一种用于通过能够发送,接收和记录音频或超声信号并且能够将信号转换为数字数据并进行数字信号处理的认证装置来验证和识别用户并用于验证用户身份的方法。 一个或多个授权用户的语音模式和用户信息被记录并存储在认证设备上。 通过向认证装置输入来自用户的声音识别信号,并且将声音识别信号的语音模式与所授权的用户的所记录的语音模式进行比较来验证用户识别,以及如果 检测到匹配,发出用户被识别为授权用户的指示。

    Method of speech recognition using variational inference with switching state space models
    8.
    发明申请
    Method of speech recognition using variational inference with switching state space models 失效
    使用与开关状态空间模型的变分推理的语音识别方法

    公开(公告)号:US20040199386A1

    公开(公告)日:2004-10-07

    申请号:US10405166

    申请日:2003-04-01

    CPC分类号: G10L15/14

    摘要: A method is developed which includes 1) defining a switching state space model for a continuous valued hidden production-related parameter and the observed speech acoustics, and 2) approximating a posterior probability that provides the likelihood of a sequence of the hidden production-related parameters and a sequence of speech units based on a sequence of observed input values. In approximating the posterior probability, the boundaries of the speech units are not fixed but are optimally determined. Under one embodiment, a mixture of Gaussian approximation is used. In another embodiment, an HMM posterior approximation is used.

    摘要翻译: 开发了一种方法,其包括:1)定义用于连续值隐藏生产相关参数和观察到的语音声学的切换状态空间模型,以及2)近似提供隐藏生产相关参数序列的可能性的后验概率 以及基于观察到的输入值的序列的语音单元序列。 在逼近后验概率中,语音单元的边界不是固定的,而是被最佳确定。 在一个实施例中,使用高斯近似的混合。 在另一个实施例中,使用HMM后验近似。

    Automated decision making using time-varying stream reliability prediction
    9.
    发明申请
    Automated decision making using time-varying stream reliability prediction 失效
    使用时变流可靠性预测的自动决策

    公开(公告)号:US20040193415A1

    公开(公告)日:2004-09-30

    申请号:US10397762

    申请日:2003-03-26

    IPC分类号: G10L015/00

    CPC分类号: G10L17/06 G10L17/20

    摘要: Automated decision making techniques are provided. For example, a technique for generating a decision associated with an individual or an entity includes the following steps. First, two or more data streams associated with the individual or the entity are captured. Then, at least one time-varying measure is computed in accordance with the two or more data streams. Lastly, a decision is computed based on the at least one time-varying measure. One form of the time-varying measure may include a measure of the coverage of a model associated with previously-obtained training data by at least a portion of the captured data. Another form of the time-varying measure may include a measure of the stability of at least a portion of the captured data. While either measure may be employed alone to compute a decision, preferably both the coverage and stability measures are employed. The technique may be used to authenticate a speaker.

    摘要翻译: 提供自动决策技术。 例如,用于生成与个体或实体相关联的决定的技术包括以下步骤。 首先,捕获与个体或实体相关联的两个或多个数据流。 然后,根据两个或多个数据流来计算至少一个时变度量。 最后,基于至少一个时变度量来计算决定。 时变测量的一种形式可以包括通过所捕获的数据的至少一部分与先前获得的训练数据相关联的模型的覆盖度的度量。 时变措施的另一种形式可以包括所捕获的数据的至少一部分的稳定性的度量。 尽管可以单独使用任一种方法来计算决策,但优选采用覆盖和稳定性度量。 该技术可用于认证扬声器。

    Systems and methods for dynamically determining the attitude of a natural language speaker
    10.
    发明申请
    Systems and methods for dynamically determining the attitude of a natural language speaker 有权
    动态确定自然语言发言者态度的系统和方法

    公开(公告)号:US20040186719A1

    公开(公告)日:2004-09-23

    申请号:US10387719

    申请日:2003-03-13

    IPC分类号: G06F017/27 G10L015/00

    CPC分类号: G06F17/277 G10L17/26

    摘要: Systems and methods for analyzing speech containing at least one lexical item, said analysis to determine an attitude of a speaker towards an entity, comprising determining at least one actual valence for the at least one lexical item by analyzing the at least one lexical item in context; determining the attitude based on the at least one actual valence; associating the speaker, the entity and the attitude; and wherein the at least one lexical item encodes attitude information about the entity.

    摘要翻译: 用于分析包含至少一个词汇项的语音的系统和方法,所述分析用于确定说话者对于实体的态度,包括通过在上下文中分析所述至少一个词汇项来确定所述至少一个词汇项的至少一个实际价值 ; 基于所述至少一个实际价态确定所述态度; 说话者,实体和态度相关联; 并且其中所述至少一个词汇项目编码关于所述实体的姿态信息。