摘要:
A single, subjective numerical rating to evaluate the performance of a telephone-based spoken dialog system. This CE rating is provided by expert human listeners who have knowledge of the design of the dialog system. Different human raters can be trained to achieve a satisfactory level of agreement. Furthermore, a classifier trained on ratings by human experts can reproduce the human ratings with the same degree of consistency. More calls can be given a CE rating than would be possible with limited human resources. More information can be provided about individual calls, e.g., to help decide between two disparate ratings by different human experts.
摘要:
A single, subjective numerical rating to evaluate the performance of a telephone-based spoken dialog system is disclosed. This CE rating is provided by expert human listeners who have knowledge of the design of the dialog system. Different human raters can be trained to achieve a satisfactory level of agreement. Furthermore, a classifier trained on ratings by human experts can reproduce the human ratings with the same degree of consistency. More calls can be given a CE rating than would be possible with limited human resources. More information can be provided about individual calls, e.g., to help decide between two disparate ratings by different human experts.
摘要:
A method and apparatus for continuously improving the performance of semantic classifiers in the scope of spoken dialog systems are disclosed. Rule-based or statistical classifiers are replaced with better performing rule-based or statistical classifiers and/or certain parameters of existing classifiers are modified. The replacement classifiers or new parameters are trained and tested on a collection of transcriptions and annotations of utterances which are generated manually or in a partially automated fashion. Automated quality assurance leads to more accurate training and testing data, higher classification performance, and feedback into the design of the spoken dialog system by suggesting changes to improve system behavior.
摘要:
A method and apparatus for continuously improving the performance of semantic classifiers in the scope of spoken dialog systems are disclosed. Rule-based or statistical classifiers are replaced with better performing rule-based or statistical classifiers and/or certain parameters of existing classifiers are modified. The replacement classifiers or new parameters are trained and tested on a collection of transcriptions and annotations of utterances which are generated manually or in a partially automated fashion. Automated quality assurance leads to more accurate training and testing data, higher classification performance, and feedback into the design of the spoken dialog system by suggesting changes to improve system behavior.
摘要:
A method for scoring non-native speech includes receiving a speech sample spoken by a non-native speaker and performing automatic speech recognition and metric extraction on the speech sample to generate a transcript of the speech sample and a speech metric associated with the speech sample. The method further includes determining whether the speech sample is scorable or non-scorable based upon the transcript and speech metric, where the determination is based on an audio quality of the speech sample, an amount of speech of the speech sample, a degree to which the speech sample is off-topic, whether the speech sample includes speech from an incorrect language, or whether the speech sample includes plagiarized material. When the sample is determined to be non-scorable, an indication of non-scorability is associated with the speech sample. When the sample is determined to be scorable, the sample is provided to a scoring model for scoring.
摘要:
A method for scoring non-native speech includes receiving a speech sample spoken by a non-native speaker and performing automatic speech recognition and metric extraction on the speech sample to generate a transcript of the speech sample and a speech metric associated with the speech sample. The method further includes determining whether the speech sample is scorable or non-scorable based upon the transcript and speech metric, where the determination is based on an audio quality of the speech sample, an amount of speech of the speech sample, a degree to which the speech sample is off-topic, whether the speech sample includes speech from an incorrect language, or whether the speech sample includes plagiarized material. When the sample is determined to be non-scorable, an indication of non-scorability is associated with the speech sample. When the sample is determined to be scorable, the sample is provided to a scoring model for scoring.
摘要:
Computer-implemented systems and methods are provided for assessing non-native speech proficiency. A non-native speech sample is processed to identify a plurality of vowel sound boundaries in the non-native speech sample. Portions of the non-native speech sample are analyzed within the vowel sound boundaries to extract vowel characteristics. The vowel characteristics are used to identify a plurality of vowel space metrics for the non-native speech sample, and the vowel space metrics are used to determine a non-native speech proficiency score for the non-native speech sample.
摘要:
Computer-implemented systems and methods are provided for assessing non-native speech proficiency. A non-native speech sample is processed to identify a plurality of vowel sound boundaries in the non-native speech sample. Portions of the non-native speech sample are analyzed within the vowel sound boundaries to extract vowel characteristics. The vowel characteristics are used to identify a plurality of vowel space metrics for the non-native speech sample, and the vowel space metrics are used to determine a non-native speech proficiency score for the non-native speech sample.