Voice activity detection using a soft decision mechanism

    公开(公告)号:US09984706B2

    公开(公告)日:2018-05-29

    申请号:US14449770

    申请日:2014-08-01

    Inventor: Ron Wein

    CPC classification number: G10L25/78

    Abstract: Voice activity detection (VAD) is an enabling technology for a variety of speech based applications. Herein disclosed is a robust VAD algorithm that is also language independent. Rather than classifying short segments of the audio as either “speech” or “silence”, the VAD as disclosed herein employees a soft-decision mechanism. The VAD outputs a speech-presence probability, which is based on a variety of characteristics.

    SYSTEM AND METHOD OF AUTOMATED EVALUATION OF TRANSCRIPTION QUALITY

    公开(公告)号:US20180068651A1

    公开(公告)日:2018-03-08

    申请号:US15676306

    申请日:2017-08-14

    Inventor: Oana Sidi Ron Wein

    CPC classification number: G10L15/01 G10L15/04 G10L15/12 G10L15/26

    Abstract: Systems and methods automatedly evaluate a transcription quality. Audio data is obtained. The audio data is segmented into a plurality of utterances with a voice activity detector operating on a computer processor. The plurality of utterances are transcribed into at least one word lattice with a large vocabulary continuous speech recognition system operating on the processor. A minimum Bayes risk decoder is applied to the at least one word lattice to create at least one confusion network. At least conformity ratio is calculated from the at least one confusion network.

    Blind diarization of recorded calls with arbitrary number of speakers
    36.
    发明授权
    Blind diarization of recorded calls with arbitrary number of speakers 有权
    用任意数量的扬声器对被录制的通话进行盲目的梳理

    公开(公告)号:US09460722B2

    公开(公告)日:2016-10-04

    申请号:US14319860

    申请日:2014-06-30

    Inventor: Oana Sidi Ron Wein

    Abstract: In a method of diarization of audio data, audio data is segmented into a plurality of utterances. Each utterance is represented as an utterance model representative of a plurality of feature vectors. The utterance models are clustered. A plurality of speaker models are constructed from the clustered utterance models. A hidden Markov model is constructed of the plurality of speaker models. A sequence of identified speaker models is decoded.

    Abstract translation: 在音频数据的分类方法中,将音频数据分割为多个话语。 每个话语被表示为代表多个特征向量的话语模型。 话语模型是聚类的。 从群集话语模型构建多个说话者模型。 由多个扬声器模型构成隐马尔可夫模型。 已识别的扬声器模型的序列被解码。

    Voice Activity Detection Using A Soft Decision Mechanism
    37.
    发明申请
    Voice Activity Detection Using A Soft Decision Mechanism 有权
    使用软判决机制进行语音活动检测

    公开(公告)号:US20150039304A1

    公开(公告)日:2015-02-05

    申请号:US14449770

    申请日:2014-08-01

    Inventor: Ron Wein

    CPC classification number: G10L25/78

    Abstract: Voice activity detection (VAD) is an enabling technology for a variety of speech based applications. Herein disclosed is a robust VAD algorithm that is also language independent. Rather than classifying short segments of the audio as either “speech” or “silence”, the VAD as disclosed herein employees a soft-decision mechanism. The VAD outputs a speech-presence probability, which is based on a variety of characteristics.

    Abstract translation: 语音活动检测(VAD)是用于各种基于语音的应用的支持技术。 这里公开了一种也是语言无关的稳健的VAD算法。 而不是将音频的短片段分类为“语音”或“沉默”,因此本文所披露的VAD会雇用软判决机制。 VAD输出基于各种特征的语音存在概率。

Patent Agency Ranking