摘要:
Artificial bandwidth expansion devices, systems, methods and computer code products are disclosed for expanding a narrowband speech signal into an artificially expanded wideband speech signal. Embodiments of the invention can operate by forming an unshaped wideband signal based on the narrowband speech signal, such as through aliasing, and shaping the wideband signal into the artificially expanded wideband speech signal by amplifying/attenuating the unshaped wideband signal using a function generated by a neural network. Weights of the neural network can be set by a training/learning subsystem which generates genomes containing the neural network weights based on simulated environments in which a device employing the artificial bandwidth expansion is expected to operate.
摘要:
An apparatus for providing improved speech synthesis may include a processor and a memory storing executable instructions. In response to execution of the instructions by the processor, the apparatus may perform at least selecting a real glottal pulse from among one or more stored real glottal pulses based at least in part on a property associated with the real glottal pulse, utilizing the real glottal pulse selected as a basis for generation of an excitation signal, and modifying the excitation signal based on spectral parameters generated by a model to provide synthetic speech.
摘要:
A method and device for improving the quality of speech signals transmitted using an audio bandwidth between 300 Hz and 3.4 kHz. After the received speech signal is divided into frames, zeros are inserted between samples to double the sampling frequency. The level of these aliased frequency components is adjusted using an adaptive algorithm based on the classification of the speech frame. Sound can be classified into sibilants and non-sibilants, and a non-sibilant sound can be further classified into a voiced sound and a stop consonant. The adjustment is based on parameters, such as the number of zero-crossings and energy distribution, computed from the spectrum of the up-sampled speech signal between 300 Hz and 3.4kHz. A new sound with a bandwidth between 300 Hz and 7.7kHz is obtained by inverse Fourier transforming the spectrum of the adjusted, up-sampled sound.