Abstract:
In order to transmit voice data, the voice data flow is broken down into phonemes. A code character is assigned to each phoneme in a selective language and/or speaker-specific phoneme catalog (PN1, PN2) and transmitted to a voice synthesis device (SS) located at the transmission target (SD2), whereby the amount of data to be transmitted is considerably reduced. The voice data flow is broken down into phonemes by a neuronal network (NN), which is trained to recognize the phonemes stored in the selective language and/or speaker-specific phoneme catalog (PN1, PN2). The flow of code characters received is once again converted into a sequence of phonemes and emitted by the voice synthesis device (SS).
Abstract:
For coding human speech for subsequent audio reproduction thereof, a plurality of speech segments is derived from speech received, and systematically stored in a data base for later concatenated readout. After the deriving, respective speech segments are fragmented into temporally consecutive source frames, similar source frames as governed by a predetermined similarity measure thereamongst that is based on an underlying parameter set are joined, and joined source frames are collectively mapped onto a single storage frame. Respective segments are stored as containing sequenced referrals to storage frames for therefrom reconstituting the segment in question.
Abstract:
A signal including speech and background noise is encoded by first decomposing the signal into speech and noise components. A first speech encoding algorithm is then used to generate codebook indices for the speech component and a second algorithm is applied to generate codebook indices for the noise component. The speech encoding algorithm performs better since it faces clean speech, while a simple, very low bit rate algorithm may be used to encode the noise.
Abstract:
The present invention relates to processing speech coding parameters in a telecommunication system. The speech coding parameters of a speech frame, produced by a speech encoder, are divided into groups, i.e. so-called virtual channels, in which speech parameter error correction, channel coding and processing of error-free or erroneous speech parameters are performed independently. At the receiving end, the processing (505) of erroneous and error-free speech parameters can thus be controlled independently on each virtual transmission channel (502) according to the quality of each virtual transmission channel. The speech parameters of the high-quality virtual channels of a speech frame can thus be processed as error-free, replacing the speech coding parameters of the low-quality virtual channels only. The independently processed (505) speech parameters of the virtual channels are thus reassembled (507) into a speech frame, which is applied to decoding. Since part of the information of also erroneous speech frames is utilized, the use of speech information received from a transmission channel can be increased in speech decoding, which reduces for instance interruptions occurring in speech as compared with a situation where all speech frames erroneous even to a slight degree were discarded. The increased and more focused error indication also reduces the number of undetected errors and thus reduces significantly the worst audible disturbances.
Abstract:
A low bit rate digital audio coding system includes an encoder which assigns codebooks to groups of quantization indexes based on their local properties resulting in codebook application ranges that are independent of block quantization boundaries. The invention also incorporates a resolution filter bank, or a tri-mode resolution filter bank, which is selectively switchable between high and low frequency resolution modes or high, low and intermediate modes such as when detecting transient in a frame. The result is a multichannel audio signal having a significantly lower bit rate for efficient transmission or storage. The decoder is essentially an inverse of the structure and methods of the encoder, and results in a reproduced audio signal that cannot be audibly distinguished from the original signal.
Abstract:
A method of accessing a dial-up service involves the following steps: (a) dialing a service number (172); (b) speaking a number of digits to form a first utterance (174); (c) recognizing the digits using speaker independent speaker recognition (176); (d) when a user has used the dial-up service previously, verifying the user based on the first utterance using a speaker verification system (178); (e) when the user cannot be verified, requesting the user enter a personal identification number (182); and (f) when the personal identification number is valid (184), providing access the dial-up service (186).
Abstract:
A musical jukebox (10) is disclosed which provides for: fast archiving of songs; a flexible user interface; easy and convenient entry of, access to and/or display (22) of data relating to songs archived by the jukebox (10); easy and convenient search and locate capabilities for locating, reviewing, retrieving and/or playing songs stored in the jukebox; and low cost relative to the functionality, features, conveniences and user-friendliness provided by the jukebox (10). Fast of flush archiving of songs (as well as other data and signals) is accomplished by first saving sets of data without compression, which allows the data to be entered quickly, and then, compressing later at an appropriate time. An unique MP3 bit allocation encoding scheme is used to compress data. An unique memory allocation supports fast data archiving. The user interface employs two-way communication (24) between a remote control (28) and the jukebox (10). A searchable song database is structured to enable very fast searching by music category, and also by title and artist. The jukebox (10) is provided with an on-board song track database to automatically identify new songs input to the jukebox (10).
Abstract:
A multi-user e-mail reader system (18) allows several users to access their e-mail accounts simultaneously and have the e-mail messages played back with speech synthesis. The user navigates through various functional states of the system (Fig. 4) using either touch-tone keypad commands or optionally voiced commands interpreted by a speech recognizer (60). Users can send reply e-mail messages without the use of a computer, by invoking the system's text processor (68). The text processor operates in conjunction with a keypad-to-ASCII conversion mechanism (64) that allows fully punctuated and properly addressed e-mail messages to be composed from the touch-tone phone. Digital audio sound file attachments may be recorded through the telephone handset and attached to an outgoing e-mail message. A system for storing canned messages (74) allows the user to quickly send pre-composed reply messages, either as stored or after editing using the text processor. The text processor uses a virtual cursor pointer (72) that may be indexed forward and backward at different granularities (78, 80), depending on whether the system is in play mode or record mode (76). The granularity can also be changed by the user.
Abstract:
In order to mask errors, the invention provides that binary representations of parameter values are precoded on the transmission side by a linear block code before transmission over a faulty channel. In addition, the redundant information which is added in such a way is not used on the reception side for detecting errors within the binary parameter representations, rather it is exploited in the course of a parameter estimation for improving the quality of the estimated parameter values.
Abstract:
The present invention is a system for controlling graphical user interface by voice commands. The present invention constitutes a means for receiving issued voice commands from a standard voice recognition system (18), a means for monitoring the state of a target application (16), a means for determining active voice commands from the state of the target application (12), a means for determining whether issued voice command is an active voice command, a means for associating each active voice command with a block of script code data (14), a means for issuing the block of script code data associated with the issued voice command to the graphical user interface when the issued voice command is determined to be an active voice command.