摘要:
A method and apparatus for synthesizing audible phrases (words) that includes capturing a spoken utterance, which may be a word, and extracting prosodic information (parameters) there from, then applying the prosodic parameters to a synthesized (nominal) word to produce a prosodic mimic word corresponding to the spoken utterance and the nominal word.
摘要:
A method of extracting a subset of speech units from a larger set of speech units for use by a speech synthesizer in synthesizing speech, wherein the speech units are stored in a compressed encoded representation that was generated by a codec, the method comprising: selecting members of the subset of speech units based on an overall cost associated with using the speech synthesizer to synthesize a test set of speech, wherein the overall cost includes at least one error introduced by using the codec to decode the stored representations of the speech units; and storing the selected subset of speech units on a speech-enabled device.
摘要:
Statistics are measured from an initial portion of a speech utterance. Feature normalization parameters are estimated based on the measured statistics and a statistically derived mapping relating measured statistics and feature normalization parameters.
摘要:
An apparatus for voice-activated control of an electrical device comprises a receiving arrangement for receiving audio data generated by user. A vioce recognition arrangement is provided for determining whether the received audio data is a command word for controlling the electrical device. The voice recognition arrangement includes a microprocessor for comparing the received audio data with voice recognition data previously stored in the voice recognition arrangement. The voice recognition arrangment generates at least one control signal based on the comparison when the comparison reaches a predetermined threshold value. A power control controls power delivered to the electrical device. The power control is responsive to at least one control signal generated by the voice recognition arrangement for operating the electrical device in response to the at least one audio command generated by the user. An arrangement for adjusting the predetermined threshold value is provided to cause a control signal to be generated by the voice recognition arrangement when the audio data generated by the user varies from the previously stored voice recognition data.
摘要:
A method for automatic speech recognition includes determining for an input signal a plurality scores representative of certainties that the input signal is associated with corresponding states of a speech recognition model, using the speech recognition model and the determined scores to compute an average signal, computing a difference value representative of a difference between the input signal and the average signal, and processing the input signal in accordance with the difference value.
摘要:
A method for automatic speech recognition includes determining for an input signal a plurality scores representative of certainties that the input signal is associated with corresponding states of a speech recognition model, using the speech recognition model and the determined scores to compute an average signal, computing a difference value representative of a difference between the input signal and the average signal, and processing the input signal in accordance with the difference value.
摘要:
A method for automatic speech recognition includes determining for an input signal a plurality scores representative of certainties that the input signal is associated with corresponding states of a speech recognition model, using the speech recognition model and the determined scores to compute an average signal, computing a difference value representative of a difference between the input signal and the average signal, and processing the input signal in accordance with the difference value.
摘要:
Channel normalization for automatic speech recognition is provided. Statistics are measured from an initial portion of a speech utterance. Feature normalization parameters are estimated based on the measured statistics and a statistically derived mapping relating measured statistics and feature normalization parameters. In some examples, the measured statistics comprise measures of an energy from the initial portion of the speech utterance. In some examples, measures of the energy comprise extreme values of the energy.
摘要:
A method for automatic speech recognition includes determining for an input signal a plurality scores representative of certainties that the input signal is associated with corresponding states of a speech recognition model, using the speech recognition model and the determined scores to compute an average signal, computing a difference value representative of a difference between the input signal and the average signal, and processing the input signal in accordance with the difference value.