摘要:
A method and apparatus for generating a pronunciation score by receiving a user phrase intended to conform to a reference phrase and processing the user phrase in accordance with at least one of an articulation-scoring engine, a duration scoring engine and an intonation-scoring engine to derive thereby the pronunciation score. The scores provided by the various scoring engines are adapted to provide a visual and/or numerical feedback that provides information pertaining to correctness or incorrectness in one or more speech-features such as intonation, articulation, voicing, phoneme error and relative word duration. Such useful interactive feedback will allow a user to quickly identify the problem area and take remedial action in reciting “tutor” sentences or phrases.
摘要:
A computer-based system generates alternative phonetic transcriptions for a target word or phrase corresponding to specific phonological processes that replace individual phonemes or clusters of two or more phonemes with replacement phonemes. The system compares a user's speech with a list of possible transcriptions that includes the base (i.e., correct) transcription of the test target as well as the different alternative transcriptions, to identify the transcription that best matches the user's. In a speech therapy application, the system identifies the phonological process(es), if any, associated with the user's speech and generates statistics over multiple test targets that can be used to diagnose the user's specific phonological disorders. The system can also be implemented in other contexts such as foreign language instruction and automated attendant applications to cover a wide variety and range of accents and/or phonological disorders.
摘要:
The system includes a jitter buffer for receiving speech packets in a Voice over Internet Protocol (VoIP) system, a playback device for adjusting the playback speed of the received speed packets, and a jitter buffer manager for detecting out of sequence packets in the jitter buffer and for sending commands to the playback device to adjust playback speed based on the detection. The speech signal is played back at the nominal speed when there are no out of sequence packets. The playback speed is decreased when an out of sequence packet is detected, thereby tending to increase the jitter buffer length. When an out of sequence packet arrives, the playback speed is increased in order to restore jitter buffer length to its nominal length.
摘要:
A method is provided that includes coding pictures by a video encoder in a digital camera to form a compressed video bit stream for real-time transmission to a host digital system coupled to the digital camera by a universal serial bus (USB), wherein an output data rate of the video encoder is at least sometimes higher than an operating data rate of the host digital system, and applying flow control in the digital camera to maintain an output data rate over the USB to the host digital system of the compressed video bit stream below the operating data rate of the host digital system.
摘要:
The intonation of speech is modified by an appropriate combination of resampling and time-domain harmonic scaling. Resampling increases (upsampling) or decreases (downsampling) the number of data points in a signal. Harmonic scaling adds or removes pitch cycles to or from a signal. The pitch of a speech signal can be increased by combining downsampling with harmonic scaling that adds an appropriate number of pitch cycles. Alternatively, pitch can be decreased by combining upsampling with harmonic scaling that removes an appropriate number of pitch cycles. The present invention can be implemented in an automated speech-therapy tool that is able to modify the intonation of prerecorded reference speech signals for playback to a user to emphasize the correct pronunciation by increasing the pitch of selected portions of words or phrases that the user had previously mispronounced.