摘要:
A digital signal processor (100) receives a digitally vocoded signal (102), and calculates a staggered average value (404) from the frame energy of each received frame, or the product of the frame energy and a voicing value. While the staggered average value is above a threshold voice indicator value, speech is declared present.
摘要:
A portable communication device (104), such as a cellular telephone, is operable in a speakerphone mode. The communication device uses a digital communication scheme, and both receives and generates vocoded signals. The speakerphone provides half duplex operation to eliminate echo. When voice activity is detected, the device activates a speaker and mutes a microphone to avoid echo. When no voice activity is detected in the received signal, the speaker is muted and the microphone activated. To determine when speech activity is present in the received signal, a novel voice activity detection (VAD) algorithm is used which takes advantage of parameters provided as part of the received vocoded signal. The new voice activity algorithm includes calculating a staggered average of the frame energy value for a sequence of received frames, and determining if the staggered average value exceeds a threshold. The algorithm also includes adjusting the threshold level by basing the threshold level on the voicing value of the present vocoded frame.
摘要:
In a portable communication device (104) that is able to operate in a speakerphone mode, a user interface such as a keypad (120) and display (122) are provided to allow the user to select (1104) an enhanced mode so that leading fricatives of speech are more likely to be detected as speech, and played over a speaker (214). Upon selecting the enhanced mode of operation, the voicing value of each frame of vocoded speech received is increased, thus making it more likely for energetic fricative frames to be detected as speech.
摘要:
In a portable communication device (100) operated in a speakerphone mode, when no speech is present in either the outbound signal received from a communication system, the speaker (114) of the communication device is muted and the microphone (120) activated. When speech is detected in an inbound signal (202, 204, 206), and speech is also subsequently detected in the outbound signal (214), the speaker is left inactivated and the microphone kept in an activated state. To determine if speech is present in the inbound signal, the inbound signal is voided, providing a succession of frames (206), each frame having a frame energy parameter and a background noise parameter (208). If the frame energy of a given frame sufficiently exceeds the an average background noise value for the given frame, then speech is declared present. To smooth this half duplex operation, a timer may be used so that brief periods of silence or fricative parts of speech do not result in a revocation of the declaration of speech in the inbound signal.