摘要:
The invention relates to a support of a concatenative TTS synthesis. In order to generate a speech database as a basis for the TTS synthesis, first, a speech processing including a segmental parametric speech encoding of speech data based on a parametric modeling of speech is performed, which results in compressed parameterized speech segments. Then, the compressed parameterized speech segments are assembled in a speech database. In order to synthesize output speech, compressed parameterized speech segments are selected from the speech database based on an available text and decompressed to regain parameterized speech segments. The parameterized speech segments are then concatenated in a parameter domain. The output speech is synthesized based on these concatenated parametric speech segments .
摘要:
The effects of bad frames received over a communications channel by a speech decoder are concealed by replacing the values of the spectral parameters of the bad frames (a bad frame being either a corrupted frame or a lost frame) with values based on an at least partly adaptive mean of recently received good frames, but in case of a corrupted frame (as opposed to a lost frame), using the bad frame itself if the bad frame meets a predetermined criterion. The aim of concealment is to find the most suitable parameters for the bad frame so that subjective quality of the synthesized speech is as high as possible.
摘要:
A method and corresponding codec for (channel) encoding speech or other data bits for transmission via a wireless communication channel, the method providing unequal error protection (UEP) using only a single encoder, and including: a step (51e) of determining how many bits to puncture in each of typically two protection classes (C
摘要:
A speech coding method and device for encoding and decoding an input signal, wherein the higher frequency components of the synthesized speech are achieved by high-pass filtering and coloring an artificial signal. The processed artificial signal is scaled by a first scaling factor during the active speech periods of the input signal and a second scaling factor during the non-active speech periods, wherein the first scaling factor is characteristic of the higher frequency band of the input signal and the second scaling factor is characteristic of the lower frequency band of the input signal. In particular, the second scaling factor is estimated based on the lower frequency components of the synthesized speech and the coloring of the artificial signal is based on the linear predictive coding coefficients characteristic of the lower frequency of the input signal.
摘要:
Wideband (WB) system includes a linear predictive (LP) analysis module (11) responsive to the n frame of the wideband speech signal, for providing LP analysis filter characteristics; a WB LP analysis filter (12a), also responsive to the n frame of the WB speech signal, for providing a filtered WB speech input; a band-splitting module (14), responsive to the filtered WB speech input for the n frame, for splitting the filtered WB speech input into k bands, the band-splitting module for providing a lower band (LB) target signal x(n); an excitation search module (16), responsive to the LB target signal x(n), for providing an LB excitation exc(n); a band-combining module (17), responsive to exc(n), for providing a WB excitation excw(n); and a WB LP synthesis filter (18), responsive to the LP analysis filter characteristics and to excw(n), for providing WB synthesized speech.
摘要:
A method and corresponding apparatus for encoding a sequence of bits for transmission as symbols, some of the bit positions of the symbols having a higher bit error rate than other bit positions. The method includes: a step (31, 32, 41, 42) of providing a plurality of sequences of bits using a convolutional encoder (31, 41), in response to a sequence of input bits, each sequence of bits being defined by a predetermined generator polynomial having a predetermined level of sensitivity to puncturing; and a step (33, 44) of mapping the bits of each sequence of bits to symbol positions based on the level of sensitivity of the generator polynomial defining the sequence of bits. With interleaving, the mapping of bits of each sequence of bits to symbol positions (33, 44) can precede a symbol interleaving step (34), or it can follow a bit interleaving step (43).
摘要:
A method and system for providing comfort noise (150) in the non-speech periods in speech communication. The comfort noise is generated based (28) on whether the background noise in the speech input is stationary or non-stationary. If the background noise is non-stationary, a random component is inserted (32) in the comfort noise using a dithering process. If the background noise is stationary, the dithering process is not used.
摘要:
A speech encoding or decoding arrangement (711, 721, 811, 821) comprises a speech signal input and a multiple mode speech encoder (402) or decoder (411) for encoding or decoding speech signals coupled to the speech signal input selectably with a first encoding or decoding mode associated with a first bandwidth or a second encoding or decoding mode associated with a second bandwidth. It comprises a soft bandwidth switching block (401, 412, 500) with an input (IN) and an output (OUT). In an encoding arrangement the input (IN) is coupled to the speech signal input and the output (OUT) is coupled to the multiple mode speech encoder (402). In a decoding arrangement the input (IN) is coupled to the multiple mode speech decoder (411) and the output (OUT) is the output of the decoding arrangement. The soft bandwidth switching block (401, 412, 500) is arranged to gradually change the bandwidth of a speech signal coupled to the multiple mode speech encoder or decoder as a response to an instruction for changing speech signal bandwidth (421).
摘要:
The effects of bad frames received over a communications channel by a speech decoder are concealed by replacing the values of the spectral parameters of the bad frames (a bad frame being either a corrupted frame or a lost frame) with values based on an at least partly adaptive mean of recently received good frames, but in case of a corrupted frame (as opposed to a lost frame), using the bad frame itself if the bad frame meets a predetermined criterion. The aim of concealment is to find the most suitable parameters for the bad frame so that subjective quality of the synthesized speech is as high as possible.
摘要:
A method and system for concealing errors in one or more bad frames in a speech sequence as part of an encoded bit stream received in a decoder. When the speech sequence is voiced, the LTP-parameters in the bad frames are replaced by the corresponding parameters in the last frame. When the speech sequence is unvoiced, the LTP-parameters in the bad frames are replaced by values calculated based on the LTP history along with an adaptively-limited random term.