摘要:
A speech information processing apparatus includes a statistical processing unit for extracting features by performing statistical processing of a feature file formed by extracting features of speech, such as the fundamental frequency and its variations, and the power and its variations of speech, from a speech file, and a label file in which a phoneme environment, comprising the accent type, the number of moras, the mora position, phonemes and the like, is considered, and a pitch pattern forming unit for forming a pitch pattern, in which phoneme environment is considered, based on the result of the statistical processing.
摘要:
A speech synthesis method and apparatus for synthesizing speech from a character series comprising a text and pitch information. The apparatus includes a parameter generator for generating power spectrum envelopes as parameters of a speech waveform to be synthesized representing the input text in accordance with the input character series. The apparatus also includes a pitch waveform generator for generating pitch waveforms whose period equals the pitch specified by the pitch information. The pitch waveform generator generates the pitch waveforms from the input pitch information and the power spectrum envelopes generated by the parameter generator. Also provided is a speech waveform output device for outputting the speech waveform obtained by connecting the generated pitch waveforms.
摘要:
A speech synthesizer includes a first indicator for indicating the amplitude of a waveform by using a random number, a second indicator for indicating the superposition period for waveforms by using a random number, a waveform generator for generating first and second waveforms having an amplitude indicated by the first indicator, and a waveform superposition device for synthesizing an unvoiced speech waveform by superposing the second waveform generated by the waveform generator onto a waveform obtained by delaying the first waveform by a superposition period indicated by the second indication means. The speech synthesizer is capable of making the frequency characteristic of unvoiced speech analogous to that of white noise, and generating synthesized speech which is natural and analogous to an actual human voice.
摘要:
A speech synthesis apparatus for outputting synthesized speech on the basis of a parameter sequence of a speech waveform includes a parameter generation unit which generates a parameter sequence for speech synthesis on the basis of a character sequence input by a character sequence input unit, and stores the generated parameter sequence in a parameter storage unit. A waveform generation unit is also provided that generates pitch waveforms each for one pitch period on the basis of synthesis parameters and pitch scales included in the parameter sequence, and generates a speech waveform by connecting the generated pitch waveforms in accordance with frame lengths set by a frame length setting unit.
摘要:
A document inputting apparatus or speech outputting apparatus inputs and displays document data, specifies accent information, pronunciation information and syllable-length information of words or characters of the document data. The apparatus displays the document data in accordance with the specified information so that information such as the accent positions or accent intensities can be recognized. Thus formed document data is stored in a memory with the accent information, the pronunciation information or the syllable-length information. Upon reading the document data from the memory and outputting it as speech, the specified information is referred to for speech synthesizing, thus outputting speech corresponding to the correct pronunciation.
摘要:
A speech synthesis method and a speech synthesis apparatus includes a system for synthesis by rule that prevents the quality of synthesized speech from deteriorating and for reducing the number of calculations that are required for the generation of a speech waveform. The speech synthesis apparatus includes a character series input section, for inputting a character series as phonetic text, a pitch waveform generator, for generating a pitch waveform by calculating a product of a matrix, which has been acquired for each pitch, and the character series, which is input by the character series input section, and a device for connecting pitch waveforms that are generated by the pitch waveform generator and for providing a speech waveform. The calculation method for the generation of such a pitch waveform provides a great reduction in the number of calculations that are required. In addition, in the calculation for the generation of a pitch waveform, a function that determines a frequency response is employed to convert a spectral envelope, which is obtained from a parameter, so that the timbres of synthesized speech can be changed without parameter operations.
摘要:
A data processing apparatus for synchronized audiovisual output has synchronizing signal bits which are assigned to bits of each sound data, represented by a 16-bit PCM code. A predetermined bit of the assigned bits having the least influence upon the human auditory sense is extracted as a synchronizing signal bit for synchronization of the image data output and sound output.
摘要:
An apparatus and method for processing vocal information includes an extractor for extracting a plurality of spectrum information from parameters for vocal information, a vector quantizer for vector-quantizing the extracted spectrum information and for producing a plurality of parameter patterns therefrom, a memory for storing the plurality of parameter patterns so obtained, and a memory for storing positional information indicating the positions at which the plurality of parameter patterns are stored and for storing code information specifying parameter patterns and corresponding to the positional information. The parameter patterns and code information can be used to synthesize speech. Because a small number of parameter patterns are used, only a small memory capacity is needed and efficient processing of vocal information can be performed.
摘要:
A method and apparatus for reading out a feature parameter and a driver sound source stored in a VCV (vowel-consonant-vowel) speech segment file, sequentially connecting the readout parameter and the readout sound source information in accordance with a predetermined rule, and supplying connected data to a speech synthesizer, thereby generating a speech output, includes a memory for storing the average power of each vowel, and a power controller for controlling the apparatus to normalize a VCV speech segment so that powers at both ends of each VCV segment coincide with the average power of each vowel.
摘要:
An object region detection unit (130) decides the region of a physical object of interest in a physical space image. An image manipulation unit (140) performs shading processing of an inclusion region including the decided region. A rendering unit (155) arranges a virtual object in virtual space at the position and orientation of the physical object of interest and generates a virtual space image based on the position and orientation of the user's viewpoint. A composition unit (160) generates a composite image by superimposing the virtual space image on the physical space image that has undergone the shading processing and outputs the generated composite image to an HMD (190).