摘要:
Computer-implemented speech system training including displaying an icon representing a concept, prompting a user to generate a vocalization comprising any sound determined by the user to associate to the icon, confirming association of the vocalization with the icon, and saving the association of the vocalization with the icon to a computer readable medium. The invention has particular applicability, but is not limited, to the field of vehicle diagnostics including vehicle wheel alignment or vehicle engine diagnostics.
摘要:
The present invention combines speech recognition tutorial training with speech recognizer voice training. The system prompts the user for speech data and simulates, with predefined screenshots, what happens when speech commands are received. At each step in the tutorial process, when the user is prompted for an input, the system is configured such that only a predefined set (which may be one) of user inputs will be recognized by the speech recognizer. When a successful recognition is being made, the speech data is used to train the speech recognition system.
摘要:
A computer implemented method, data processing system, apparatus and computer program product for determining current behavioral, psychological and speech styles characteristics of a speaker in a given situation and context, through analysis of current speech utterances of the speaker. The analysis calculates different prosodic parameters of the speech utterances, consisting of unique secondary derivatives of the primary pitch and amplitude speech parameters, and compares these parameters with pre-obtained reference speech data, indicative of various behavioral, psychological and speech styles characteristics. The method includes the formation of the classification speech parameters reference database, as well as the analysis of the speaker's speech utterances in order to determine the current behavioral, psychological and speech styles characteristics of the speaker in the given situation.
摘要:
A speech characteristic-amount calculation circuit 31 calculates an amount of speech characteristics of each phrase in input speech. An estimation process likelihood calculation circuit 33 compares the calculated speech characteristic amount of a phrase with speech pattern sequence information of a plurality of phrases stored in a storage unit 34 to select a plurality of candidates having from a higher likelihood value to a lower likelihood value for the phrases. A recognition filtering device 4 determines whether to reject or not reject the extracted candidates based on the likelihood difference ratio between the difference in likelihood values between the first candidate and the second candidate and the difference in likelihood values between the second candidate and the third candidate.
摘要:
The invention provides a method for automated training of a plurality of artificial neural networks for phoneme recognition using training data, wherein the training data comprises speech signals subdivided into frames, each frame associated with a phoneme label, wherein the phoneme label indicates a phoneme associated with the frame, the method comprising the steps of: providing a sequence of frames from the training data, wherein the number of frames in the sequence of frames is at least equal to the number of artificial neural networks, assigning to each of the artificial neural networks a different subsequence of the provided sequence, wherein each subsequence comprises a predetermined number of frames, determining a common phoneme label for the sequence of frames based on the phoneme labels of one or more frames of one or more subsequences of the provided sequence, and training each artificial neural network using the common phoneme label.
摘要:
A method of recognizing a speaker of an utterance (602) in a speech recognition system, comprising - comparing the utterance (602) to a plurality of speaker models (604) for different speakers; - determining a likelihood score (606) for each speaker model, the likelihood score (606) indicating how well the speaker model corresponds to the utterance; and - for each speaker model (604), determining a probability (609) that the utterance (602) originates from the speaker corresponding to the speaker model (604), wherein the determination of the probability (609) for a speaker model (604) is based on the likelihood scores (606) for the speaker models and takes a prior knowledge (607) about the speaker model into account.