摘要:
A voice messaging system includes an input device to accept a destination electronic messaging address, a voice-to-text converter to convert or transcribe a received voice message into a converted text message, and a processor to operate an electronic messaging program and to prepare the text message for automatic electronic transmission to the destination electronic messaging address. A method is also provided to convert a voice message into a text message and to transmit the same to a destination electronic messaging address. The method includes inputting a destination electronic messaging address. A received voice message is converted or transcribed into a text message and prepared as a text file. The prepared text file is automatically transmitted to the destination electronic messaging address. A log file including the text of the voice messages may be maintained with updates at a predetermined time interval. A database of the converted text messages may be generated and maintained to provide a searchable mechanism of archived messages.
摘要:
A 1200 b/s vocoder providing a high degree of speech intelligibility and natural voice quality includes a tenth-order linear prediction analyzer, a split vector quantizer for line spectral frequencies, circuitry providing voicing classification and pitch estimation, a differential pitch and gain quantizer and a multiplexer for producing an encoded word transmitted to a receptive demultiplexer. The vocoder provides a characteristic encoded word including a first codeword, a second codeword, a pitch codeword and a gain codeword, wherein the first and second codewords are selected from respective first and second codebooks having a equal number of codewords and wherein the first and second codewords represent unequal numbers of elements of respective first and second sub-vectors. A codebook populating method for a split vector quantizer vocoder is also utilized.
摘要:
A method of training a TTS or other system to assign intonational features, such as intonational phrase boundaries, to input text that overcome the shortcomings of the known methods is described. The method of training involves taking a set of predetermined text (not speech or a signal representative of speech) and having a human annotate it with intonational feature annotations. This results in annotated text. Next, the structure of the set of predetermined text is analyzed to generate information. This information is used, along with the intonational feature annotations, to generate a statistical representation. The statistical representation may then be stored and repeatedly used to generate synthesized speech from new sets of input text without training the TTS system further. The resulting trained system and use thereof are also part of the invention.
摘要:
A method of transmitting speech signals with reduced bandwith requirements. With this invention an original speech signal is first converted to a textual representation, and a facsimile of the original speech is determined from the textual representation. Then a minimum error turn is derived from the difference between the original speech signal and the facsimile of the original speech signal. The minimum error turn is then compressed, and it is this compressed minimum error turn, along with the textual representation, that is transmitted on the communications medium. At the receiving end, the textual representation and the difference representation are split through a demultiplexer. The textual representation is then passed through a synthesizer while the difference representation is passed through a mapper. The synthesizer along with synthesis parameter storage converts the textual representation into a digital representation of speech, while the mapper modifies the received difference representation by applying sub or super sampling corrections.
摘要:
Techniques for implementing adaptable voice activation operations for interactive speech recognition devices and instruments. Specifically, such speech recognition devices and instruments include an input sound signal power or volume detector in communication with a central CPU for bringing the CPU out of an initial sleep state upon detection of perceived voice exceeding a predetermined threshold volume level and is continuously perceived for at least a certain period of time. If both these conditions are satisfied, the CPU is transitioned into an active mode so that the perceived voice can be analyzed against a set of registered key words to determine if a "power on" command or similar instruction has been received. If so, the CPU maintains an active state in normal speech recognition processing ensues until a "power off" command is received. However, if the perceived and analyzed voice can not be recognized, it is deemed to be background noise and the minimum threshold is selectively updated to accommodate the volume level of the perceived but unrecognized voice. Other aspects include tailoring the volume level of the synthesized voice response according to the perceived volume level as detected by the input sound signal power detector, as well as modifying audible response volume in accordance with updated volume threshold levels.
摘要:
Methods and apparatus for producing efficiently sized models suitable for pattern recognition purposes are described. Various embodiments are directed to the automated generation, evaluation, and selection of reduced size models from an initial model having a relatively large number of components, e.g., more components than can be stored for a particular intended application. To achieve model size reduction in an automated iterative manner, expectation maximization (EM) model training techniques are combined, in accordance with the present invention, with model size constraints. In one embodiment, a plurality of reduced size models are generated using a LaGrange multiplier from an input model and input size constraints. The plurality of reduced size models are stored in a buffer and scored using a likelihood scoring technique. The one of the reduced size models receiving the highest score may be selected as the reduced size model to be output or used as an input model during future iterations of the model size reduction process. The reduced size model to be used, e.g., for speech, image or other pattern recognition purposes, may be selected from the buffered models produced during multiple iterations of the model size reduction process.
摘要:
A device and method in which polyphones of speech of a first language is received and stored as well as a movement pattern in a person's face and/or body is registered. The registration of the movement pattern is made by measuring movement at a number of measuring points in the face/body of the speaker, where the measurements are made at the same time that the polyphones are registered. In connection with translation of a person's speech from one language into another, the polyphones and corresponding movement patterns in the face are linked up to a movement model in the face. A picture image of a face of the real person is after that pasted over the model, at which one to the language corresponding movement pattern is obtained. The invention consequently gives the impression that the person really speaks the language in question.
摘要:
The invention provides a speech coding apparatus by which a good sound quality can be obtained even when the bit rate is low. The speech coding apparatus includes an excitation quantization circuit which quantizes an excitation signal using a plurality of pulses. The position of at least one of the pulses is represented by a number of bits determined in advance, and the amplitude of the pulse is determined in advance depending upon the position of the pulse.
摘要:
An encoding/decoding system employing vector quantization realizes a high quality encoding and decoding with decreased quantizing errors, employing a small sized codebook which faithfully represents each of the inputted waveform vectors. An encoding/decoding system includes an encoding apparatus and a decoding apparatus, each having a codebook for storing information vectors representative of a predetermined number of signal patterns and index that determine the information vectors. The encoding apparatus compares a vector representing an object signal to be quantized with each information vector in the codebook, selects an information vector that is closest to the vector and outputs an index for the information vector. The decoding apparatus obtains an information vector corresponding to the index obtained at the encoding apparatus side by referring to the codebook and decodes the object signal. The codebook utilizes a temporary memory connected thereto. The content of the codebook is temporarily moved to the temporary memory when the identity of the speaker changes. The contents of the temporary memory are read out when the original speakers returns to the system.
摘要:
A sample speech is analyzed by a speech analyzing unit to obtain sample characteristic parameters, and a coding distortion is calculated from the sample characteristic parameters in each of a plurality of coding modules. The sample characteristic parameters and the coding distortions are statistically processed by a statistical processing unit to obtain a coding module selecting rule. Thereafter, when a speech is analyzed by the speech analyzing unit to obtain characteristic parameters, an appropriate coding module is selected by a coding module selecting unit from the coding modules according to the coding module selecting rule on condition that a coding distortion for the characteristic parameters is minimized in the appropriate coding module. Thereafter, the characteristic parameters of the speech are coded in the appropriate coding module, and a coded speech is obtained. When the coded speech is decoded, a reproduced speech is obtained. Accordingly, because an appropriate coding module can be easily selected from a plurality of coding modules according to the coding module selecting rule, any allophone occurring in a reproduced speech can be prevented at a low calculation volume.