摘要:
The mobile telephone is provided with the capability for automatically adjusting the gain of a microphone of the telephone based upon the detected noise level in which the cellular telephone is operated. As the noise level increases, the gain of the microphone is automatically decreased, thereby compensating for the natural tendency of telephone users to speak more loudly in noisy environments. Also, by decreasing the microphone gain, any clipping that might otherwise occur as a result of the user speaking more loudly is avoided and the signal-to-noise ratio is not thereby decreased. Furthermore, because the microphone gain decreases, the volume level of the voice of the user as it is output from the other party's telephone is not unduly loud. Hence, the other party need not manually decrease the speaker gain of his or her telephone. In the exemplary embodiment, the cellular telephone includes a digital signal processor configured or programmed to apply the detected noise level to look-up tables relating various noise levels to appropriate speaker and microphone gain levels. Also, in the exemplary embodiment, the mobile telephone includes a speaker and the gain of the speaker is adjusted to increase in response to an increase in the background noise level. A device for adjusting gain in a microphone of a wireless communications device includes adjustable digital gain logic coupled to the microphone and a limiter coupled to the adjustable digital gain logic. The limiter performs peak detection on a speech signal that is input to the microphone.
摘要:
The mobile telephone is provided with the capability for automatically adjusting the gain of a microphone of the telephone based upon the detected noise level in which the cellular telephone is operated. As the noise level increases, the gain of the microphone is automatically decreased, thereby compensating for the natural tendency of telephone users to speak more loudly in noisy environments. Also, by decreasing the microphone gain, any clipping that might otherwise occur as a result of the user speaking more loudly is avoided and the signal-to-noise ratio is not thereby decreased. Furthermore, because the microphone gain decreases, the volume level of the voice of the user as it is output from the other party's telephone is not unduly loud. Hence, the other party need not manually decrease the speaker gain of his or her telephone. In the exemplary embodiment, the cellular telephone includes a digital signal processor configured or programmed to apply the detected noise level to look-up tables relating various noise levels to appropriate speaker and microphone gain levels. Also, in the exemplary embodiment, the mobile telephone includes a speaker and the gain of the speaker is adjusted to increase in response to an increase in the background noise level. A method of automatically adjusting the gain of speaker in a wireless communications device includes the steps of obtaining a digital value representing the available headroom, estimating the background noise level, and adjusting the volume based on the background noise estimate and the available headroom. Thus, for example, the need for manual volume control buttons on a cellular telephone is eliminated.
摘要:
A voice recognition rejection scheme for capturing an utterance includes the steps accepting the utterance, applying an N-best algorithm to the utterance, or rejecting the utterance. The utterance is accepted if a first predefined relationship exists between one or more closest comparison results for the utterance with respect to a stored word and one or more differences between the one or more closest comparison results and one or more other comparison results between the utterance and one or more other stored words. An N-best algorithm is applied to the utterance if a second predefined relationship exists between the one or more closest comparison results and the one or more differences between the one or more closest comparison results and the one or more other comparison results. The utterance is rejected if a third predefined relationship exists between the one or more closest comparison results and the one or more differences between the one or more closest comparison results and the one or more other comparison results. One of the one or more other comparison results may advantageously be a next-closest comparison result for the utterance and another store word. The first, second, and third predefined relationships may advantageously be linear relationships.
摘要:
A method and apparatus for identifying frequency bands to compute linear phase shifts between frame prototypes in a speech coder includes partitioning the frequency spectrum of a prototype of a frame by dividing the frequency spectrum into segments, assigning one or more bands to each segment, and establishing, for each segment, a set of bandwidths for the bands. The bandwidths may be fixed and uniformly distributed in any given segment. The bandwidths may be fixed and non-uniformly distributed in any segment. The bandwidths may be variable and non-uniformly distributed in any given segment.
摘要:
An apparatus for accurate endpointing of speech in the presence of noise includes a processor and a software module. The processor executes the instructions of the software module to compare an utterance with a first signal-to-noise-ratio (SNR) threshold value to determine a first starting point and a first ending point of the utterance. The processor then compares with a second SNR threshold value a part of the utterance that predates the first starting point to determine a second starting point of the utterance. The processor also then compares with the second SNR threshold value a part of the utterance that postdates the first ending point to determine a second ending point of the utterance. The first and second SNR threshold values are recalculated periodically to reflect changing SNR conditions. The first SNR threshold value advantageously exceeds the second SNR threshold value.
摘要:
Wideband speech signals must be converted to narrowband speech signals if the transmission medium or the destination terminal is constructed with narrowband constraints. A typical wideband-to-narrowband conversion method is the elimination of frequencies above 3400 Hz using a low pass filter and a down sampler. However, this method produces a muffled speech sound since the resulting narrowband signal has a flat frequency response. Methods and apparatus are presented herein to enhance the acoustic quality of a wideband-to-narrowband converted signal. A bandwidth switching filter is used to emphasize a mid-range frequency portion of the wideband signal so that the resulting narrowband signal has a non-flat frequency spectrum.
摘要:
A speech signal is decoded by a vocoder and the reconstructed speech samples are provided to a decoded frame check unit. The decoded frame check unit examines the energy of the reconstructed speech and compares the energy of the reconstructed speech to a range of acceptable energy values. If the energy is not within the range of energy values, a frame erasure is declared and the decoded frame is prevented from being to the speaker in the telephone. In the exemplary implementation, the speech is reconstructed by a vocoder which includes a postfilter which in turn includes automatic gain control. The automatic gain control element of a post filter includes a means for measuring the energy of the decoded speech data. This measured energy is used by the decoded frame check unit to decide whether to provide the decoded data to the user or to declare a frame erasure. This implementation reduces the amount of additional hardware necessary to implement the present invention.
摘要:
A method and apparatus for CELP-based to CELP-based vocoder packet translation. The apparatus includes a formant parameter translator and an excitation parameter translator. The formant parameter translator includes a model order converter and a time base converter. The method includes the steps of translating the formant filter coefficients of the input packet from the input CELP format to the output CELP format and translating the pitch and codebook parameters of the input speech packet from the input CELP format to the output CELP format. The step of translating the formant filter coefficients includes the steps of converting the model order of the formant filter coefficients from the model order of the input CELP format to the model order of the output CELP format and converting the time base of the resulting coefficients from the input CELP format time base to the output CELP format time base.
摘要:
A method and apparatus for predictively quantizing voiced speech includes a parameter generator and a quantizer. The parameter generator is configured to extract parameters from frames of predictive speech such as voiced speech, and to transform the extracted information to a frequency-domain representation. The quantizer is configured to subtract a weighted sum of the parameters for previous frames from the parameter for the current frame. The quantizer is configured to quantize the difference value. A prototype extractor may be added to first extract a pitch period prototype to be processed by the parameter generator.
摘要:
A spoken user interface for speech-enabled devices includes a processor and a set of software instructions that are executable by the processor and stored in nonvolatile memory. A user of the speech-enabled device is prompted to enter a voice tag associated with an entry in a call history of the speech-enabled device. The call history includes lists of incoming and outgoing email messages, and incoming and outgoing telephone calls. The user is prompted to enter a voice tag after associated with a telephone number or email address in the call history after a user-selected number of telephone calls has been sent from the speech-enabled device to that telephone number, or has been sent from the telephone with that telephone number to the speech-enabled device, or after a user-selected number of email messages has been sent from the speech-enabled device to that email address, or has been sent from that email address to the speech-enabled device. The user may populate a phonebook of the speech-enabled device with email addresses by sending an email message to the speech-enabled device from a computer and including additional email addresses in the To: field and/or the CC: field of the email message.