摘要:
A speech coder (400) for coding an information signal varies the codebook configuration based on parameters inherent in the information signal. The speech coder (400) requires no additional overhead for sending of mode parameters while allowing subframe resolution. The configurations vary not only for voicing level, but also for pitch period since different physiological traits yield different codebook configurations. A dispersion matrix (406) within the speech coder (400) facilitates a codebook search which is performed on vectors whose length can be less than a subframe length. Additionally, use of the dispersion matrix (406) allows the addition of random events for very slightly voiced speech which incurs little computational overhead but produces a rich excitation.
摘要:
A method of synthesizing audio signals provides outputs of high subjective quality which retain the semblance of natural origin. Unlike frequency scaling methods, the pitch of a signal can be modified independently of the spectrum envelope. A set of candidate input sections is defined based on input transform-domain signal representations. A match-output transform-domain section is formed using the result of a matching process which compares candidate input sections to a reference section. The reference section for this matching process is defined based on one or more previously formed match-output sections. Main-output transform-domain signal representations are formed based on one or more match-output sections, whereby such main-output transform-domain signal representations can be inverse-transformed and combined with the output time-domain signal. This method is referred to as "Transform-Domain Match-Output Extension" (TDMOX). One embodiment of the invention implements block-transform processing using an FFT algorithm. Matching processes search over ranges of frequency shifts, ranges of time shifts, and ranges of resampling factors. Selections are based on maximum cross-correlation, maximum sum of dot products, and minimum sum of squared differences, respectively. Applications include text-to-speech synthesis, audio editing, musical effects processing, real-time low-delay voice transformation, internet telephony, voice mail, Karaoke, hearing aids, and film animation.
摘要:
A speech signal is input to an excitation signal generating section, a prediction filter and a prediction parameter calculator. The prediction parameter calculator calculates a predetermined number of prediction parameters (LPC parameter or reflection coefficient) by an autocorrelation method or covariance method, and supplies the acquired prediction parameters to a prediction parameter coder. The codes of the prediction parameters are sent to a decoder and a multiplexer. The decoder sends decoded values of the codes of the prediction parameters to the prediction filter and the excitation signal generating section. The prediction filter calculates a prediction residual signal, which is the difference between the input speech signal and the decoded prediction parameter, and sends it to the excitation signal generating section. The excitation signal generating section calculates the pulse interval and amplitude for each of a predetermined number of subframes based on the input speech signal, the prediction residual signal and the quantized value of the prediction parameter, and sends them to the multiplexer. The multiplexer combines these codes and the codes of the prediction parameters, and send the results as an output signal of a coding apparatus to a transmission path or the like.
摘要:
A spread sheet reading-out/collating apparatus, in which a spread sheet preparation module obtains a range to be read out from a position of a header cell specified by a read-out object specifying module using a read-out range determining module and outputs cell data within the range to be read out as well as the display format to a voice-generating data generation module, a voice-generating data generation module generates voice-generating data for a text comprising a Chinese and a Japanese characters mixed therein, and a voice synthesis module outputs voices based on the voice-generating data.
摘要:
A method and apparatus for detecting counter homeostasis oscillation perturbation signals (CHOPS) found within the wave form of human speech that reflects either arousal in the autonomic nervous system or other biological processes. The apparatus is a speech analysis system for obtaining biofeedback information from human speech samples having variable duration. The speech analysis system comprises means for digitizing the human speech samples, storage means for receiving the digitized speech samples from the digitizing means and storing the digitized speech samples, processing means for detecting and analyzing CHOPS in the digitized speech samples and display means for presenting the analyzed speech samples in a visual representation. The speech analysis system may further include transducer means for collecting and transducing human speech samples into electrical signals and input means for configuring the analysis parameters of the processing means. The present invention does not require any electrode or probe attachment from the speech analysis system to a subject. The method provides biofeedback from physiological indicators of stress using the speech analysis system. The method includes recording a human speech sample having variable duration with the transducer means, digitizing the human speech sample with the means for digitizing, storing the digitized speech sample in the storage means, determining CHOPS in the digitized speech sample with the processing means based on pre-determined parameters and identifying relationships between the CHOPS in the digitized speech sample with the processing means.
摘要:
A speech compression system called "Transform Predictive Coding", or TPC, provides for encoding 7 kHz wideband speech (16 kHz sampling) at a target bit-rate range of 16 to 32 kb/s (1 to 2 bits/sample). The system uses short-term and long-term prediction to remove the redundancy in speech. A prediction residual is transformed and coded in the frequency domain to take advantage of knowledge in human auditory perception. The TPC coder uses only open-loop quantization and therefore has a fairly low complexity. The speech quality of TPC is essentially transparent at 32 kb/s, very good at 24 kb/s, and acceptable at 16 kb/s.
摘要:
In encoding in which an adaptive codebook such as PSI-CELP or a fixed codebook is used on switching selection, waveform distortion caused by selection of the fixed codebook in case input speech frequency components are changed significantly is diminished. An output of an adaptive codebook 21 or an output of a fixed codebook 22 is selected by a changeover selection switch 26 and summed to an output of noise codebooks 23, 24 so as to be sent to a linear prediction synthesis filter 16. A switching control circuit 19 for controlling the switching of a changeover control switch 26 operates in response to a prediction gain which is a ratio of the linear prediction residual energy to the initial signal energy from a linear prediction analysis circuit 14 so that, if the prediction gain is smaller than a pre-set threshold value, the switching control circuit 19 judges the input signal to be voiced and controls the changeover control switch 26 for compulsorily selecting the output of the adaptive codebook 21.
摘要:
A computer telephone relay service for interfacing between a textphone and a telephone. A textphone call is made to the computer which makes a further call to a telephone or vice versa. The computer then translates the messages between the text-phone and telephone using voice recognition and text-to-speech hardware.
摘要:
The system and method of the invention relates to voice detection technology for determining instants of time at which a snapshot of noise characteristics results in improved adaptation of noise floors used in voice detection. The approach is based on the "lower envelope" of the smoothed input signal power. Incorporation of this approach in a simple time domain VAD (Voice Activity Detector) results in an effective low-complexity system which, on the basis of simulations, gives good performance down to SNR values of about 0 dB. In the invention the lower envelope also provides the updated value of the noise threshold during the presence of speech. The invention can also be embedded in other, more complex (e.g., frequency domain) VADs at low computational cost.
摘要:
To avoid a predetermined amount of time and or a certain amount of processing time prior to determining a number of frames for each speech input portion, a fast voice recognition system enables real-time frame counting based upon a comparison between a decreasing number of frames and an increasing time-dependent threshold. The real-time voice recognition also enables a substantially reduced rate for erroneous partial matching.