摘要:
A method and apparatus are provided for determining an instantaneous frequency and an instantaneous bandwidth of a speech resonance of a speech signal. The method includes receiving a speech signal having a real component; filtering the speech signal so as to generate a plurality of filtered signals such that the real component and an imaginary component of the speech signal are reconstructed; and generating a first estimated frequency and a first estimated bandwidth of a speech resonance of the speech signal based on both a first filtered signal of the plurality of filtered signals and a single-lag delay of the first filtered signal.
摘要:
A system for conducting a telephonic speech recognition application includes an automated telephone device for making telephonic contact with a respondent and a speech recognition device which, upon the telephonic contact being made, presents the respondent with at least one introductory prompt for the respondent to reply to; receives a spoken response from the respondent; and performs a speech recognition analysis on the spoken response to determine a capability of the respondent to complete the application. If the speech recognition device, based on the spoken response to the introductory prompt, determines that the respondent is capable of competing the application, the speech recognition device presents at least one application prompt to the respondent. If the speech recognition device, based on the spoken response to the introductory prompt, determines that the respondent is not capable of completing the application, the speech recognition system presents instructions on completing the application to the respondent.
摘要:
A system for conducting a telephonic speech recognition application includes an automated telephone device for making telephonic contact with a respondent and a speech recognition device which, upon the telephonic contact being made, presents the respondent with at least one introductory prompt for the respondent to reply to; receives a spoken response from the respondent; and performs a speech recognition analysis on the spoken response to determine a capability of the respondent to complete the application. If the speech recognition device, based on the spoken response to the introductory prompt, determines that the respondent is capable of competing the application, the speech recognition device presents at least one application prompt to the respondent. If the speech recognition device, based on the spoken response to the introductory prompt, determines that the respondent is not capable of completing the application, the speech recognition system presents instructions on completing the application to the respondent.
摘要:
A method and apparatus are provided for determining an instantaneous frequency and an instantaneous bandwidth of a speech resonance of a speech signal. The method includes receiving a speech signal having a real component; filtering the speech signal so as to generate a plurality of filtered signals such that the real component and an imaginary component of the speech signal are reconstructed; and generating a first estimated frequency and a first estimated bandwidth of a speech resonance of the speech signal based on both a first filtered signal of the plurality of filtered signals and a single-lag delay of the first filtered signal.
摘要:
A speech analysis system uses one or more digital processors to reconstruct a speech signal by accurately extracting speech formants from a digitized version of the speech signal. The system extracts the formants by determining an estimated instantaneous frequency and an estimated instantaneous bandwidth of speech resonances of the digital version of the speech signal in real time. The system digitally filters the digital speech signal using a plurality of complex digital filters in parallel having overlapping bandwidths to ensure that substantially all of the bandwidth of the speech signal is covered. This virtual chain of overlapping complex digital filters produces a corresponding plurality of complex filtered signals. A first estimated frequency and a first estimated bandwidth is generated for each of the filtered signals, and speech resonances of the input speech signal are identified therefrom.
摘要:
A phoneme estimator in a speech-recognition system includes energy detect circuitry for detecting the segments of a speech signal that should be analyzed for phoneme content. Speech-element processors then process the speech signal segments, calculating nonlinear (powers and products) representations of the segments. The nonlinear representation data is applied to speech-element modeling circuitry which reduces the data through speech element specific modeling. The reduced data are then subjected to further nonlinear processing. The results of the further nonlinear processing are again applied to speech-element modeling circuitry, producing phoneme isotype estimates. The phoneme isotype estimates are rearranged and consolidated, that is, the estimates are uniformly labeled and duplicate estimates are consolidated, forming estimates of words or phrases containing minimal numbers of phonemes. The estimates may then be compared with stored words or phrases to determine what was spoken.
摘要:
A phoneme estimator (12) in a speech-recognition system (10) includes trigger circuitry (18, 22) for identifying the segments of speech that should be analyzed for phoneme content. Speech-element processors (24, 26, and 28) calculate the likelihoods that currently received speech contains individual phonemes, but they operate only when the trigger circuitry identifies such segments. The computation-intensive processing for determining phoneme likelihoods is thus performed on only a small subset of the received speech segments. The accuracy of the speech-element processors (24, 26, and 28) is enhanced because these processors operate by recognition of patterns not only in elements of the data-reduced representations of the received speech but also in higher-ordered products of those elements; that is, these circuits employ non-linear modeling for phoneme identification.
摘要:
A system for conducting a telephonic speech recognition application includes an automated telephone device for making telephonic contact with a respondent and a speech recognition device which, upon the telephonic contact being made, presents the respondent with at least one introductory prompt for the respondent to reply to; receives a spoken response from the respondent; and performs a speech recognition analysis on the spoken response to determine a capability of the respondent to complete the application. If the speech recognition device, based on the spoken response to the introductory prompt, determines that the respondent is capable of competing the application, the speech recognition device presents at least one application prompt to the respondent. If the speech recognition device, based on the spoken response to the introductory prompt, determines that the respondent is not capable of completing the application, the speech recognition system presents instructions on completing the application to the respondent.
摘要:
A phoneme estimator in a speech-recognition system includes energy detect circuitry for detecting the segments of a speech signal that should be analyzed for phoneme content. Speech-element processors then process the speech signal segments, calculating nonlinear representations of the segments. The nonlinear representation data is applied to speech-element modeling circuitry which reduces the data through speech element specific modeling. The reduced data are then subjected to further nonlinear processing. The results of the further nonlinear processing are again applied to speech-element modeling circuitry, producing phoneme isotype estimates. The phoneme isotype estimates are rearranged and consolidated, that is, the estimates are uniformly labeled and duplicate estimates are consolidated, forming estimates of words or phrases containing minimal numbers of phonemes. The estimates may then be compared with stored words or phrases to determine what was spoken.