Method and apparatus for suppressing background music or noise from the
speech input of a speech recognizer
    1.
    发明授权
    Method and apparatus for suppressing background music or noise from the speech input of a speech recognizer 失效
    用于从语音识别器的语音输入中抑制背景音乐或噪声的方法和装置

    公开(公告)号:US5848163A

    公开(公告)日:1998-12-08

    申请号:US594679

    申请日:1996-02-02

    CPC分类号: G10L21/0208

    摘要: A method and apparatus for removing the effect of background music or noise from speech input to a speech recognizer so as to improve recognition accuracy has been devised. Samples of pure music or noise related to the background music or noise that corrupts the speech input are utilized to reduce the effect of the background in speech recognition. The pure music and noise samples can be obtained in a variety of ways. The music or noise corrupted speech input is segmented in overlapping segments and is then processed in two phases: first, the best matching pure music or noise segment is aligned with each speech segment; then a linear filter is built for each segment to remove the effect of background music or noise from the speech input and the overlapping segments are averaged to improve the signal to noise ratio. The resulting acoustic output can then be fed to a speech recognizer.

    摘要翻译: 已经设计了一种用于从语音输入到语音识别器中去除背景音乐或噪声的影响以提高识别精度的方法和装置。 用于破坏语音输入的背景音乐或噪音相关的纯音乐或噪音的样本被用来减少背景在语音识别中的影响。 纯音乐和噪音样本可以通过各种方式获得。 音乐或噪声损坏的语音输入被分割成重叠的段,然后分两个阶段进行处理:首先,最佳匹配的纯音乐或噪声段与每个语音段对齐; 然后为每个段构建线性滤波器,以消除来自语音输入的背景音乐或噪声的影响,并且重叠的段被平均以提高信噪比。 然后,所得到的声输出可以被馈送到语音识别器。

    Transcription of speech data with segments from acoustically dissimilar
environments
    2.
    发明授权
    Transcription of speech data with segments from acoustically dissimilar environments 失效
    用来自声学不同环境的片段转录语音数据

    公开(公告)号:US6067517A

    公开(公告)日:2000-05-23

    申请号:US595722

    申请日:1996-02-02

    IPC分类号: G10L15/20

    CPC分类号: G10L15/20

    摘要: A technique to improve the recognition accuracy when transcribing speech data that contains data from a wide range of environments. Input data in many situations contains data from a variety of sources in different environments. Such classes include: clean speech, speech corrupted by noise (e.g., music), non-speech (e.g., pure music with no speech), telephone speech, and the identity of a speaker. A technique is described whereby the different classes of data are first automatically identified, and then each class is transcribed by a system that is made specifically for it. The invention also describes a segmentation algorithm that is based on making up an acoustic model that characterizes the data in each class, and then using a dynamic programming algorithm (the viterbi algorithm) to automatically identify segments that belong to each class. The acoustic models are made in a certain feature space, and the invention also describes different feature spaces for use with different classes.

    摘要翻译: 一种在转录包含来自广泛环境的数据的语音数据时提高识别精度的技术。 在许多情况下,输入数据包含来自不同环境的各种数据源。 这样的课程包括:干净的语音,由噪声(例如,音乐),非语音(例如,没有语音的纯音乐),电话语音和扬声器的身份损坏的语音。 描述了一种技术,其中首先自动识别不同类别的数据,然后每个类由专门为其制定的系统进行转录。 本发明还描述了基于构成表征每个类中的数据的声学模型,然后使用动态规划算法(维特比算法)来自动识别属于每个类的段的分段算法。 声学模型是在某个特征空间中制成的,本发明还描述了用于不同类别的不同特征空间。

    Conversational computing via conversational virtual machine
    3.
    发明授权
    Conversational computing via conversational virtual machine 有权
    通过对话虚拟机进行会话计算

    公开(公告)号:US07729916B2

    公开(公告)日:2010-06-01

    申请号:US11551901

    申请日:2006-10-23

    IPC分类号: G10L15/22 G10L15/28

    摘要: A conversational computing system that provides a universal coordinated multi-modal conversational user interface (CUI) 10 across a plurality of conversationally aware applications (11) (i.e., applications that “speak” conversational protocols) and conventional applications (12). The conversationally aware applications (11) communicate with a conversational kernel (14) via conversational application APIs (13). The conversational kernel 14 controls the dialog across applications and devices (local and networked) on the basis of their registered conversational capabilities and requirements and provides a unified conversational user interface and conversational services and behaviors. The conversational computing system may be built on top of a conventional operating system and APIs (15) and conventional device hardware (16). The conversational kernel (14) handles all I/O processing and controls conversational engines (18). The conversational kernel (14) converts voice requests into queries and converts outputs and results into spoken messages using conversational engines (18) and conversational arguments (17). The conversational application API (13) conveys all the information for the conversational kernel (14) to transform queries into application calls and conversely convert output into speech, appropriately sorted before being provided to the user.

    摘要翻译: 一种对话计算系统,其跨越多个会话感知应用(11)(即,“说”对话协议的应用)和常规应用(12)提供通用协调多模态对话用户界面(CUI)10。 对话感知应用(11)通过对话应用API(13)与对话内核(14)通信。 会话核心14基于其注册的对话能力和需求来控制应用和设备(本地和网络)之间的对话,并提供统一的对话用户界面和对话服务和行为。 对话计算系统可以构建在常规操作系统和API(15)和常规设备硬件(16)之上。 对话内核(14)处理所有I / O处理和控制对话引擎(18)。 会话内核(14)将语音请求转换为查询,并将会话引擎(18)和会话参数(17)将输出和结果转换为口语消息。 对话应用程序API(13)传达对话内核(14)的所有信息,以将查询转换成应用程序调用,并相反地将输出转换为语音,在提供给用户之前进行适当排序。

    CONVERSATIONAL COMPUTING VIA CONVERSATIONAL VIRTUAL MACHINE
    4.
    发明申请
    CONVERSATIONAL COMPUTING VIA CONVERSATIONAL VIRTUAL MACHINE 有权
    通过对话虚拟机对话计算

    公开(公告)号:US20070043574A1

    公开(公告)日:2007-02-22

    申请号:US11551901

    申请日:2006-10-23

    IPC分类号: G10L21/00

    摘要: A conversational computing system that provides a universal coordinated multi-modal conversational user interface (CUI) 10 across a plurality of conversationally aware applications (11) (i.e., applications that “speak” conversational protocols) and conventional applications (12). The conversationally aware applications (11) communicate with a conversational kernel (14) via conversational application APIs (13). The conversational kernel 14 controls the dialog across applications and devices (local and networked) on the basis of their registered conversational capabilities and requirements and provides a unified conversational user interface and conversational services and behaviors. The conversational computing system may be built on top of a conventional operating system and APIs (15) and conventional device hardware (16). The conversational kernel (14) handles all I/O processing and controls conversational engines (18). The conversational kernel (14) converts voice requests into queries and converts outputs and results into spoken messages using conversational engines (18) and conversational arguments (17). The conversational application API (13) conveys all the information for the conversational kernel (14) to transform queries into application calls and conversely convert output into speech, appropriately sorted before being provided to the user.

    摘要翻译: 一种对话计算系统,其跨越多个会话感知应用(11)(即,“说”对话协议的应用)和常规应用(12)提供通用协调多模态对话用户界面(CUI)10。 对话感知应用(11)通过对话应用API(13)与对话内核(14)通信。 会话核心14基于其注册的对话能力和需求来控制应用和设备(本地和网络)之间的对话,并提供统一的对话用户界面和对话服务和行为。 对话计算系统可以构建在常规操作系统和API(15)和常规设备硬件(16)之上。 对话内核(14)处理所有I / O处理和控制对话引擎(18)。 会话内核(14)将语音请求转换为查询,并将会话引擎(18)和会话参数(17)将输出和结果转换为口语消息。 对话应用程序API(13)传达对话内核(14)的所有信息,以将查询转换成应用程序调用,并相反地将输出转换为语音,在提供给用户之前进行适当排序。

    Conversational computing via conversational virtual machine
    5.
    发明授权
    Conversational computing via conversational virtual machine 失效
    通过对话虚拟机进行会话计算

    公开(公告)号:US07137126B1

    公开(公告)日:2006-11-14

    申请号:US09806565

    申请日:1999-10-01

    摘要: A conversational computing system that provides a universal coordinated multi-modal conversational user interface (CUI) (10) across a plurality of conversationally aware applications (11) (i.e., applications that “speak” conversational protocols) and conventional applications (12). The conversationally aware maps, applications (11) communicate with a conversational kernel (14) via conversational application APIs (13). The conversational kernel (14) controls the dialog across applications and devices (local and networked) on the basis of their registered conversational capabilities and requirements and provides a unified conversational user interface and conversational services and behaviors. The conversational computing system may be built on top of a conventional operating system and APIs (15) and conventional device hardware (16). The conversational kernel (14) handles all I/O processing and controls conversational engines (18). The conversational kernel (14) converts voice requests into queries and converts outputs and results into spoken messages using conversational engines (18) and conversational arguments (17). The conversational application API (13) conveys all the information for the conversational kernel (14) to transform queries into application calls and conversely convert output into speech, appropriately sorted before being provided to the user.

    摘要翻译: 一种对话计算系统,其跨越多个会话感知应用(11)(即,“说”对话协议的应用“)和常规应用(12)提供通用协调多模态对话用户界面(CUI)(10)。 对话感知地图,应用程序(11)通过对话应用程序API(13)与对话内核(14)进行通信。 对话内核(14)根据其注册的会话能力和要求,控制应用和设备(本地和网络)之间的对话,并提供统一的会话用户界面和对话服务和行为。 对话计算系统可以构建在常规操作系统和API(15)和常规设备硬件(16)之上。 对话内核(14)处理所有I / O处理和控制对话引擎(18)。 会话内核(14)将语音请求转换为查询,并将会话引擎(18)和会话参数(17)将输出和结果转换为口语消息。 对话应用程序API(13)传达对话内核(14)的所有信息,以将查询转换成应用程序调用,并相反地将输出转换为语音,在提供给用户之前进行适当排序。

    CONVERSATIONAL COMPUTING VIA CONVERSATIONAL VIRTUAL MACHINE
    7.
    发明申请
    CONVERSATIONAL COMPUTING VIA CONVERSATIONAL VIRTUAL MACHINE 有权
    通过对话虚拟机对话计算

    公开(公告)号:US20090313026A1

    公开(公告)日:2009-12-17

    申请号:US12544473

    申请日:2009-08-20

    IPC分类号: G10L15/22

    摘要: A conversational computing system that provides a universal coordinated multi-modal conversational user interface (CUI) 10 across a plurality of conversationally aware applications (11) (i.e., applications that “speak” conversational protocols) and conventional applications (12). The conversationally aware applications (11) communicate with a conversational kernel (14) via conversational application APIs (13). The conversational kernel 14 controls the dialog across applications and devices (local and networked) on the basis of their registered conversational capabilities and requirements and provides a unified conversational user interface and conversational services and behaviors. The conversational computing system may be built on top of a conventional operating system and APIs (15) and conventional device hardware (16). The conversational kernel (14) handles all I/O processing and controls conversational engines (18). The conversational kernel (14) converts voice requests into queries and converts outputs and results into spoken messages using conversational engines (18) and conversational arguments (17). The conversational application API (13) conveys all the information for the conversational kernel (14) to transform queries into application calls and conversely convert output into speech, appropriately sorted before being provided to the user.

    摘要翻译: 一种对话计算系统,其跨越多个会话感知应用(11)(即,“说”对话协议的应用)和常规应用(12)提供通用协调多模态对话用户界面(CUI)10。 对话感知应用(11)通过对话应用API(13)与对话内核(14)通信。 会话核心14基于其注册的会话能力和需求来控制应用和设备(本地和网络)之间的对话,并提供统一的对话用户界面和对话服务和行为。 对话计算系统可以构建在常规操作系统和API(15)和常规设备硬件(16)之上。 对话内核(14)处理所有I / O处理和控制对话引擎(18)。 会话内核(14)将语音请求转换为查询,并将会话引擎(18)和会话参数(17)将输出和结果转换为口语消息。 对话应用程序API(13)传达对话内核(14)的所有信息,以将查询转换成应用程序调用,并相反地将输出转换为语音,在提供给用户之前进行适当排序。

    State-dependent speaker clustering for speaker adaptation
    8.
    发明授权
    State-dependent speaker clustering for speaker adaptation 失效
    用于说话者适应的状态依赖的扬声器聚类

    公开(公告)号:US5787394A

    公开(公告)日:1998-07-28

    申请号:US572223

    申请日:1995-12-13

    IPC分类号: G10L15/06 G10L5/06

    CPC分类号: G10L15/07 G10L2015/0631

    摘要: A system and method for adaptation of a speaker independent speech recognition system for use by a particular user. The system and method gather acoustic characterization data from a test speaker and compare the data with acoustic characterization data generated for a plurality of training speakers. A match score is computed between the test speaker's acoustic characterization for a particular acoustic subspace and each training speaker's acoustic characterization for the same acoustic subspace. The training speakers are ranked for the subspace according to their scores and a new acoustic model is generated for the test speaker based upon the test speaker's acoustic characterization data and the acoustic characterization data of the closest matching training speakers. The process is repeated for each acoustic subspace.

    摘要翻译: 一种适用于特定用户使用的独立于说话者的语音识别系统的系统和方法。 该系统和方法从测试扬声器收集声学表征数据,并将数据与为多个训练说话者生成的声学特征数据进行比较。 在特定声学子空间的测试扬声器的声学特性与相同声学子空间的每个训练说话者的声学特性之间计算匹配分数。 训练演讲者根据其分数对子空间进行排名,并且基于测试讲者的声学表征数据和最接近的匹配训练说话者的声学表征数据为测试说话者生成新的声学模型。 对于每个声学子空间重复该过程。

    RESOURCE CONFIGURATION IN MULTI-MODAL DISTRIBUTED COMPUTING SYSTEMS
    9.
    发明申请
    RESOURCE CONFIGURATION IN MULTI-MODAL DISTRIBUTED COMPUTING SYSTEMS 有权
    多模式分布式计算系统中的资源配置

    公开(公告)号:US20090094451A1

    公开(公告)日:2009-04-09

    申请号:US12272597

    申请日:2008-11-17

    IPC分类号: G06F1/24

    摘要: A method and system for configuring available resources in real-time to automatically accommodate the needs of the system user in multi-modal distributed computing system is disclosed. Information about the location or environment of a wireless device is used, preferably in combination with user personal preferences and past history to modify the behavior of the wireless device, including the selection of the most appropriate mode of interaction with the device and the activation of applications thereon as appropriate.

    摘要翻译: 公开了一种实时配置可用资源以自动适应多模态分布式计算系统中系统用户需求的方法和系统。 使用关于无线设备的位置或环境的信息,优选地结合用户个人偏好和过去历史来修改无线设备的行为,包括选择与设备的最合适的交互模式以及激活应用 在适当的情况下。

    Resource configuration in multi-modal distributed computing systems
    10.
    发明授权
    Resource configuration in multi-modal distributed computing systems 有权
    多模式分布式计算系统中的资源配置

    公开(公告)号:US07454608B2

    公开(公告)日:2008-11-18

    申请号:US10698101

    申请日:2003-10-31

    IPC分类号: G06R15/177 G10L15/00

    摘要: A method and system for configuring available resources in real-time to automatically accommodate the needs of the system user in multi-modal distributed computing system is disclosed. Information about the location or environment of a wireless device is used, preferably in combination with user personal preferences and past history to modify the behavior of the wireless device, including the selection of the most appropriate mode of interaction with the device and the activation of applications thereon as appropriate.

    摘要翻译: 公开了一种实时配置可用资源以自动适应多模态分布式计算系统中系统用户需求的方法和系统。 使用关于无线设备的位置或环境的信息,优选地结合用户个人偏好和过去历史来修改无线设备的行为,包括选择与设备的最合适的交互模式以及激活应用 在适当的情况下。