摘要:
Envelope identification section generates input envelope data (DEVin) indicative of a spectral envelope (EVin) of an input voice. Template acquisition section reads out, from a storage section, converting spectrum data (DSPt) indicative of a frequency spectrum (SPt) of a converting voice. On the basis of the input envelope data (DEVin) and the converting spectrum data (DSPt), a data generation section specifies a frequency spectrum (SPnew) corresponding in shape to the frequency spectrum (SPt) of the converting voice and having a substantially same spectral envelope as the spectral envelope (EVin) of the input voice, and the data generation section generates new spectrum data (DSPnew) indicative of the frequency spectrum (SPnew). Reverse FFT section and output processing section generates an output voice signal (Snew) on the basis of the new spectrum data (DSPnew).
摘要:
A voice processing apparatus has a storage device that stores registration information containing a characteristic parameter of a given voice. The voice processing apparatus is further provided with a judgment unit, a management unit and a notification unit. The judgment unit judges whether an input voice is appropriate or not for creating or updating the registration information based on a degree of a difference between an inter-band correlation matrix of an input voice acquired this time and an inter-band correlation matrix of another input voice that is judged as being appropriate last time. The management unit creates or updates the registration information based on a characteristic parameter of the input voice when the judgment unit judges that the input voice is appropriate. The notification unit notifies a speaker of the input voice when the judgment unit judges that the input voice is inappropriate.
摘要:
Envelope identification section generates input envelope data (DEVin) indicative of a spectral envelope (EVin) of an input voice. Template acquisition section reads out, from a storage section, converting spectrum data (DSPt) indicative of a frequency spectrum (SPt) of a converting voice. On the basis of the input envelope data (DEVin) and the converting spectrum data (DSPt), a data generation section specifies a frequency spectrum (SPnew) corresponding in shape to the frequency spectrum (SPt) of the converting voice and having a substantially same spectral envelope as the spectral envelope (EVin) of the input voice, and the data generation section generates new spectrum data (DSPnew) indicative of the frequency spectrum (SPnew). Reverse FFT section and output processing section generates an output voice signal (Snew) on the basis of the new spectrum data (DSPnew).
摘要:
An apparatus is constructed for converting an input voice signal into an output voice signal according to a target voice signal. In the apparatus, an input device provides the input voice signal composed of original sinusoidal components and original residual components other than the original sinusoidal components. An extracting device extracts original attribute data from at least the sinusoidal components of the input voice signal. The original attribute data is characteristic of the input voice signal. A synthesizing device synthesizes new attribute data based on both of the original attribute data derived from the input voice signal and target attribute data being characteristic of the target voice signal composed of target sinusoidal components and target residual components other than the sinusoidal components. The target attribute data is derived from at least the target sinusoidal components. An output device operates based on the new attribute data and either of the original residual component and the target residual component for producing the output voice signal.
摘要:
In a voice processing device, a male voice index calculator calculates a male voice index indicating a similarity of the input sound relative to a male speaker sound model. A female voice index calculator calculates a female voice index indicating a similarity of the input sound relative to a female speaker sound model. A first discriminator discriminates the input sound between a non-human-voice sound and a human voice sound which may be either of the male voice sound or the female voice sound. A second discriminator discriminates the input sound between the male voice sound and the female voice sound based on the male voice index and the female voice index in case that the first discriminator discriminates the human voice sound.
摘要:
Microphones are provided at an air inlet of the engine and a vehicle-cabin-side wall surface of an engine room, and engine sounds are picked up. The engine sound is processed by a signal processing section, and the processed engine sound is output from a speaker provided in a vehicle cabin. The signal processing section is provided with a filter which simulates a sound insulation characteristic of the vehicle cabin and a transformation section for processing the engine sound according to driving condition. A spectrum transformation characteristic of the transformation section is determined according to values detected by a vehicle speed sensor, an engine speed sensor, and an accelerator depression sensor, and a spectrum of the engine sound is transformed by means of specification of the spectrum transformation characteristic, thereby enhancing an engine sound.
摘要:
A method for synthesizing a natural-sounding singing voice divides performance data into a transition part and a long sound part. The transition part is represented by articulation (phonemic chain) data that is read from an articulation template database and is outputted without modification. For the long sound part, a new characteristic parameter is generated by linearly interpolating characteristic parameters of the transition parts positioned before and after the long sound part and adding thereto a changing component of stationary data that is read from a constant part (stationary) template database. An associated apparatus for carrying out the singing voice synthesizing method includes a phoneme database for storing articulation data for the transition part and stationary data for the long sound part, a first device for outputting the articulation data, and a second device for outputting the newly-generated characteristic parameter of the long sound part.
摘要:
A voice converting apparatus is constructed for converting an input voice into an output voice according to a target voice. In the apparatus, a storage section provisionally stores source data, which is associated to and extracted from the target voice. An analyzing section analyzes the input voice to extract therefrom a series of input data frames representing the input voice. A producing section produces a series of target data frames representing the target voice based on the source data, while aligning the target data frames with the input data frames to secure synchronization between the target data frames and the input data frames. A synthesizing section synthesizes the output voice according to the target data frames and the input data frames. In the recognizing feature analysis, a characteristic analyzer extracts from the input voice a characteristic vector. A memory memorizes target behavior data representing a behavior of the target voice. An alignment processor determines a temporal relation between the input data frames and the target data frames according to the characteristic vector and the target behavior data so as to output alignment data. A target decoder produces the target data frames according to the alignment data, the input data frames and the source data containing phoneme of the target voice.
摘要:
Character extraction section extracts character amounts, pertaining to a prosody of voice, from a voice signal sequentially in a time-serial manner. Difference value calculation calculates a difference value between each of the extracted character amounts and a reference value. Processing values, corresponding to the individual character amounts, are generated in accordance with the respective difference values, and a voice processing section controls the individual character amounts of the voice signal in accordance with the processing values corresponding to the character amounts and thereby generates an output signal having a prosody changed from the prosody of the voice signal.
摘要:
Even in a state that the change of an environmental noise cannot be anticipated, a sound generating period in an audio signal can be specified with high accuracy. Sound in an audio space in which an audio signal processing system 1 is disposed is always collected by a microphone 20 and inputted to an audio signal processing device 10 as an audio signal. Before a user carried out a prescribed operation, the audio signals inputted from the microphone 20 are sequentially stored in a first buffer 121. After the prescribed operation is carried out, the audio signals are sequentially stored in a second buffer 122. A specifying part 114 considers the level of the audio signal stored in the first buffer 121 as the level of the environmental noise and the level of the audio signal sequentially stored in the second buffer 122 as the level of sound generated at a current time to calculate an S/N ratio. The specifying part 114 sequentially decides whether or not the calculated S/N ratio satisfies a prescribed condition to specify the sound generating period in the audio signal.
摘要翻译:即使在不能预期到环境噪声的变化的情况下,也可以高精度地规定音频信号的声音发生期间。 音频信号处理系统1设置在音频空间中的声音总是由麦克风20收集,并作为音频信号输入到音频信号处理装置10。 在用户执行规定的操作之前,从麦克风20输入的音频信号被顺序地存储在第一缓冲器121中。在执行规定的操作之后,音频信号被顺序地存储在第二缓冲器122中。指定部分114 将存储在第一缓冲器121中的音频信号的电平视为环境噪声的电平和顺序存储在第二缓冲器122中的音频信号的电平,作为当前时间产生的声音电平,以计算S / N 比。 指定部分114顺序地确定所计算的S / N比是否满足规定条件以指定音频信号中的声音产生时段。