Method and apparatus for suppressing background music or noise from the
speech input of a speech recognizer
    1.
    发明授权
    Method and apparatus for suppressing background music or noise from the speech input of a speech recognizer 失效
    用于从语音识别器的语音输入中抑制背景音乐或噪声的方法和装置

    公开(公告)号:US5848163A

    公开(公告)日:1998-12-08

    申请号:US594679

    申请日:1996-02-02

    CPC分类号: G10L21/0208

    摘要: A method and apparatus for removing the effect of background music or noise from speech input to a speech recognizer so as to improve recognition accuracy has been devised. Samples of pure music or noise related to the background music or noise that corrupts the speech input are utilized to reduce the effect of the background in speech recognition. The pure music and noise samples can be obtained in a variety of ways. The music or noise corrupted speech input is segmented in overlapping segments and is then processed in two phases: first, the best matching pure music or noise segment is aligned with each speech segment; then a linear filter is built for each segment to remove the effect of background music or noise from the speech input and the overlapping segments are averaged to improve the signal to noise ratio. The resulting acoustic output can then be fed to a speech recognizer.

    摘要翻译: 已经设计了一种用于从语音输入到语音识别器中去除背景音乐或噪声的影响以提高识别精度的方法和装置。 用于破坏语音输入的背景音乐或噪音相关的纯音乐或噪音的样本被用来减少背景在语音识别中的影响。 纯音乐和噪音样本可以通过各种方式获得。 音乐或噪声损坏的语音输入被分割成重叠的段,然后分两个阶段进行处理:首先,最佳匹配的纯音乐或噪声段与每个语音段对齐; 然后为每个段构建线性滤波器,以消除来自语音输入的背景音乐或噪声的影响,并且重叠的段被平均以提高信噪比。 然后,所得到的声输出可以被馈送到语音识别器。

    Transcription of speech data with segments from acoustically dissimilar
environments
    2.
    发明授权
    Transcription of speech data with segments from acoustically dissimilar environments 失效
    用来自声学不同环境的片段转录语音数据

    公开(公告)号:US6067517A

    公开(公告)日:2000-05-23

    申请号:US595722

    申请日:1996-02-02

    IPC分类号: G10L15/20

    CPC分类号: G10L15/20

    摘要: A technique to improve the recognition accuracy when transcribing speech data that contains data from a wide range of environments. Input data in many situations contains data from a variety of sources in different environments. Such classes include: clean speech, speech corrupted by noise (e.g., music), non-speech (e.g., pure music with no speech), telephone speech, and the identity of a speaker. A technique is described whereby the different classes of data are first automatically identified, and then each class is transcribed by a system that is made specifically for it. The invention also describes a segmentation algorithm that is based on making up an acoustic model that characterizes the data in each class, and then using a dynamic programming algorithm (the viterbi algorithm) to automatically identify segments that belong to each class. The acoustic models are made in a certain feature space, and the invention also describes different feature spaces for use with different classes.

    摘要翻译: 一种在转录包含来自广泛环境的数据的语音数据时提高识别精度的技术。 在许多情况下,输入数据包含来自不同环境的各种数据源。 这样的课程包括:干净的语音,由噪声(例如,音乐),非语音(例如,没有语音的纯音乐),电话语音和扬声器的身份损坏的语音。 描述了一种技术,其中首先自动识别不同类别的数据,然后每个类由专门为其制定的系统进行转录。 本发明还描述了基于构成表征每个类中的数据的声学模型,然后使用动态规划算法(维特比算法)来自动识别属于每个类的段的分段算法。 声学模型是在某个特征空间中制成的,本发明还描述了用于不同类别的不同特征空间。