摘要:
Method, system and computer product are provided for a computer implemented method for providing a confidence measure for speaker diarization. The method includes: receiving an audio session as unsegmented audio data; computing a spectral ratio of principal component analysis (PCA) of sections of the received audio session by a ratio between the largest eigenvalue and the second largest eigenvalue; using the PCA spectral ratio as a confidence measure for speaker diarization processing.
摘要:
A method and system for speaker diarization are provided. Pre-trained acoustic models of individual speaker and/or groups of speakers are obtained. Speech data with multiple speakers is received and divided into frames. For a frame, an acoustic feature vector is determined extended to include log-likelihood ratios of the pre-trained models in relation to a background population model. The extended acoustic feature vector is used in segmentation and clustering algorithms.
摘要:
Techniques for notifying at least one entity of an occurrence of an event in an audio signal are provided. At least one preference is obtained from the at least one entity. An occurrence of an event in the audio signal is determined. The event is related to at least one of at least one speaker and at least one topic. The at least one entity is notified of the occurrence of the event in the audio signal, in accordance with the at least one preference.
摘要:
A method, system, and computer program product compensation of intra-speaker variability in speaker diarization are provided. The method includes: dividing a speech session into segments of duration less than an average duration between speaker change; parameterizing each segment by a time dependent probability density function supervector, for example, using a Gaussian Mixture Model; computing a difference between successive segment supervectors; and computing a scatter measure such as a covariance matrix of the difference as an estimate of intra-speaker variability. The method further includes compensating the speech session for intra-speaker variability using the estimate of intra-speaker variability.
摘要:
Techniques for notifying at least one entity of an occurrence of an event in an audio signal are provided. At least one preference is obtained from the at least one entity. An occurrence of an event in the audio signal is determined. The event is related to at least one of at least one speaker and at least one topic. The at least one entity is notified of the occurrence of the event in the audio signal, in accordance with the at least one preference.
摘要:
Techniques for notifying at least one entity of an occurrence of an event in an audio signal are provided. At least one preference is obtained from the at least one entity. An occurrence of an event in the audio signal is determined. The event is related to at least one of at least one speaker and at least one topic. The at least one entity is notified of the occurrence of the event in the audio signal, in accordance with the at least one preference.
摘要:
A method, system, and computer program product compensation of intra-speaker variability in speaker diarization are provided. The method includes: dividing a speech session into segments of duration less than an average duration between speaker change; parameterizing each segment by a time dependent probability density function supervector, for example, using a Gaussian Mixture Model; computing a difference between successive segment supervectors; and computing a scatter measure such as a covariance matrix of the difference as an estimate of intra-speaker variability. The method further includes compensating the speech session for intra-speaker variability using the estimate of intra-speaker variability.
摘要:
An arrangement is provided for using a phoneme lattice for speech recognition and/or keyword spotting. The phoneme lattice may be constructed for an input speech signal and searched to produce a textual representation for the input speech signal and/or to determine if the input speech signal contains targeted keywords. An expectation maximization (EM) trained phoneme confusion matrix may be used when searching the phoneme lattice. The phoneme lattice may be constructed in a client and sent to a server, which may search the phoneme lattice to produce a result.
摘要:
A method and system for speaker diarization are provided. Pre-trained acoustic models of individual speaker and/or groups of speakers are obtained. Speech data with multiple speakers is received and divided into frames. For a frame, an acoustic feature vector is determined extended to include log-likelihood ratios of the pre-trained models in relation to a background population model. The extended acoustic feature vector is used in segmentation and clustering algorithms.
摘要:
A novel and useful method of using labeled training data and machine learning tools to train a speaker diarization system. Intra-speaker variability profiles are created from training data consisting of an audio stream labeled where speaker changes occur (i.e. which participant is speaking at any given time). These intra-speaker variability profiles are then applied to an unlabeled audio stream to segment the audio stream into speaker homogeneous segments and to cluster segments according to speaker identity.