摘要:
A voice recognition rejection scheme for capturing an utterance includes the steps accepting the utterance, applying an N-best algorithm to the utterance, or rejecting the utterance. The utterance is accepted if a first predefined relationship exists between one or more closest comparison results for the utterance with respect to a stored word and one or more differences between the one or more closest comparison results and one or more other comparison results between the utterance and one or more other stored words. An N-best algorithm is applied to the utterance if a second predefined relationship exists between the one or more closest comparison results and the one or more differences between the one or more closest comparison results and the one or more other comparison results. The utterance is rejected if a third predefined relationship exists between the one or more closest comparison results and the one or more differences between the one or more closest comparison results and the one or more other comparison results. One of the one or more other comparison results may advantageously be a next-closest comparison result for the utterance and another store word. The first, second, and third predefined relationships may advantageously be linear relationships.
摘要:
A method and apparatus for identifying frequency bands to compute linear phase shifts between frame prototypes in a speech coder includes partitioning the frequency spectrum of a prototype of a frame by dividing the frequency spectrum into segments, assigning one or more bands to each segment, and establishing, for each segment, a set of bandwidths for the bands. The bandwidths may be fixed and uniformly distributed in any given segment. The bandwidths may be fixed and non-uniformly distributed in any segment. The bandwidths may be variable and non-uniformly distributed in any given segment.
摘要:
An apparatus for accurate endpointing of speech in the presence of noise includes a processor and a software module. The processor executes the instructions of the software module to compare an utterance with a first signal-to-noise-ratio (SNR) threshold value to determine a first starting point and a first ending point of the utterance. The processor then compares with a second SNR threshold value a part of the utterance that predates the first starting point to determine a second starting point of the utterance. The processor also then compares with the second SNR threshold value a part of the utterance that postdates the first ending point to determine a second ending point of the utterance. The first and second SNR threshold values are recalculated periodically to reflect changing SNR conditions. The first SNR threshold value advantageously exceeds the second SNR threshold value.
摘要:
Wideband speech signals must be converted to narrowband speech signals if the transmission medium or the destination terminal is constructed with narrowband constraints. A typical wideband-to-narrowband conversion method is the elimination of frequencies above 3400 Hz using a low pass filter and a down sampler. However, this method produces a muffled speech sound since the resulting narrowband signal has a flat frequency response. Methods and apparatus are presented herein to enhance the acoustic quality of a wideband-to-narrowband converted signal. A bandwidth switching filter is used to emphasize a mid-range frequency portion of the wideband signal so that the resulting narrowband signal has a non-flat frequency spectrum.
摘要:
A speech signal is decoded by a vocoder and the reconstructed speech samples are provided to a decoded frame check unit. The decoded frame check unit examines the energy of the reconstructed speech and compares the energy of the reconstructed speech to a range of acceptable energy values. If the energy is not within the range of energy values, a frame erasure is declared and the decoded frame is prevented from being to the speaker in the telephone. In the exemplary implementation, the speech is reconstructed by a vocoder which includes a postfilter which in turn includes automatic gain control. The automatic gain control element of a post filter includes a means for measuring the energy of the decoded speech data. This measured energy is used by the decoded frame check unit to decide whether to provide the decoded data to the user or to declare a frame erasure. This implementation reduces the amount of additional hardware necessary to implement the present invention.
摘要:
A method and apparatus for CELP-based to CELP-based vocoder packet translation. The apparatus includes a formant parameter translator and an excitation parameter translator. The formant parameter translator includes a model order converter and a time base converter. The method includes the steps of translating the formant filter coefficients of the input packet from the input CELP format to the output CELP format and translating the pitch and codebook parameters of the input speech packet from the input CELP format to the output CELP format. The step of translating the formant filter coefficients includes the steps of converting the model order of the formant filter coefficients from the model order of the input CELP format to the model order of the output CELP format and converting the time base of the resulting coefficients from the input CELP format time base to the output CELP format time base.
摘要:
A method and apparatus for predictively quantizing voiced speech includes a parameter generator and a quantizer. The parameter generator is configured to extract parameters from frames of predictive speech such as voiced speech, and to transform the extracted information to a frequency-domain representation. The quantizer is configured to subtract a weighted sum of the parameters for previous frames from the parameter for the current frame. The quantizer is configured to quantize the difference value. A prototype extractor may be added to first extract a pitch period prototype to be processed by the parameter generator.
摘要:
A spoken user interface for speech-enabled devices includes a processor and a set of software instructions that are executable by the processor and stored in nonvolatile memory. A user of the speech-enabled device is prompted to enter a voice tag associated with an entry in a call history of the speech-enabled device. The call history includes lists of incoming and outgoing email messages, and incoming and outgoing telephone calls. The user is prompted to enter a voice tag after associated with a telephone number or email address in the call history after a user-selected number of telephone calls has been sent from the speech-enabled device to that telephone number, or has been sent from the telephone with that telephone number to the speech-enabled device, or after a user-selected number of email messages has been sent from the speech-enabled device to that email address, or has been sent from that email address to the speech-enabled device. The user may populate a phonebook of the speech-enabled device with email addresses by sending an email message to the speech-enabled device from a computer and including additional email addresses in the To: field and/or the CC: field of the email message.
摘要:
A method and apparatus for providing feedback from the decoder to the encoder to improve performance in a predictive speech coder under frame erasure conditions includes notifying an encoder in a receiving speech coder if a decoder in the receiving speech coder fails to receive a frame transmitted by an encoder in a transmitting speech coder. A modified packet is transmitted from the encoder in the receiving speech coder to a decoder in the transmitting speech coder in response to the notification. The decoder in the transmitting speech coder notifies the encoder in the transmitting speech coder when the modified packet is received. The encoder in the transmitting speech coder then encodes a packet with a modified encoding format. The modified encoding format may be either a low-memory encoding format or a memoryless encoding format. The modified packet may have an erasure indicator bit set to a digital value of one.
摘要:
A method and apparatus for predictively quantizing voiced speech includes a parameter generator and a quantizer. The parameter generator is configured to extract parameters from frames of predictive speech such as voiced speech, and to transform the extracted information to a frequency-domain representation. The quantizer is configured to subtract a weighted sum of the parameters for previous frames from the parameter for the current frame. The quantizer is configured to quantize the difference value. A prototype extractor may be added to first extract a pitch period prototype to be processed by the parameter generator.