Abstract:
In a method of diarization of audio data, audio data is segmented into a plurality of utterances. Each utterance is represented as an utterance model representative of a plurality of feature vectors. The utterance models are clustered. A plurality of speaker models are constructed from the clustered utterance models. A hidden Markov model is constructed of the plurality of speaker models. A sequence of identified speaker models is decoded.
Abstract:
Voice activity detection (VAD) is an enabling technology for a variety of speech based applications. Herein disclosed is a robust VAD algorithm that is also language independent. Rather than classifying short segments of the audio as either “speech” or “silence”, the VAD as disclosed herein employees a soft-decision mechanism. The VAD outputs a speech-presence probability, which is based on a variety of characteristics.
Abstract:
Systems and methods automatedly evaluate a transcription quality. Audio data is obtained. The audio data is segmented into a plurality of utterances with a voice activity detector operating on a computer processor. The plurality of utterances are transcribed into at least one word lattice with a large vocabulary continuous speech recognition system operating on the processor. A minimum Bayes risk decoder is applied to the at least one word lattice to create at least one confusion network. At least conformity ratio is calculated from the at least one confusion network.
Abstract:
Systems and methods automatedly evaluate a transcription quality. Audio data is obtained. The audio data is segmented into a plurality of utterances with a voice activity detector operating on a computer processor. The plurality of utterances are transcribed into at least one word lattice with a large vocabulary continuous speech recognition system operating on the processor. A minimum Bayes risk decoder is applied to the at least one word lattice to create at least one confusion network. At least conformity ratio is calculated from the at least one confusion network.
Abstract:
Systems and methods of script identification in audio data obtained from audio data. The audio data is segmented into a plurality of utterances. A script model representative of a script text is obtained. The plurality of utterances are decoded with the script model. A determination is made if the script text occurred in the audio data.
Abstract:
In a method of diarization of audio data, audio data is segmented into a plurality of utterances. Each utterance is represented as an utterance model representative of a plurality of feature vectors. The utterance models are clustered. A plurality of speaker models are constructed from the clustered utterance models. A hidden Markov model is constructed of the plurality of speaker models. A sequence of identified speaker models is decoded.
Abstract:
Voice activity detection (VAD) is an enabling technology for a variety of speech based applications. Herein disclosed is a robust VAD algorithm that is also language independent. Rather than classifying short segments of the audio as either “speech” or “silence”, the VAD as disclosed herein employees a soft-decision mechanism. The VAD outputs a speech-presence probability, which is based on a variety of characteristics.
Abstract:
Systems and methods of script identification in audio data obtained from audio data. The audio data is segmented into a plurality of utterances. A script model representative of a script text is obtained. The plurality of utterances are decoded with the script model. A determination is made if the script text occurred in the audio data.
Abstract:
Systems and methods of script identification in audio data obtained from audio data. The audio data is segmented into a plurality of utterances. A script model representative of a script text is obtained. The plurality of utterances are decoded with the script model. A determination is made if the script text occurred in the audio data.
Abstract:
Systems and methods of script identification in audio data obtained from audio data. The audio data is segmented into a plurality of utterances. A script model representative of a script text is obtained. The plurality of utterances are decoded with the script model. A determination is made if the script text occurred in the audio data.