摘要:
A signal decoding method and apparatus in which the speech signal reproducing speed is controlled without changing the phoneme or the pitch, in which the apparatus has a data number convertor for converting the number of orthogonal transform coefficients entering a transmission signal input terminal from N to M, an inverse orthogonal transform unit for inverse orthogonal-transforming the M number of the orthogonal transform coefficients obtained by the data number convertor, and a linear predictive coding synthesis filter for performing predictive synthesis based on the short-term prediction residuals obtained by the inverse orthogonal transform unit. For an input signal, short-term prediction residuals are found and are orthogonally transformed to form the orthogonal transform coefficients at a rate of N coefficients per transform unit. The frequency positions of the N transform coefficients may be rearranged to M values by M/N or by oversampling to change N to M. A portable radio terminal embodying the invention is described.
摘要:
A method for decoding encoded speech signals uses sine wave synthesis based on harmonics of the original speech signal. The harmonics are obtained by transforming the original speech signal from a time domain to a frequency domain, and the harmonics are arranged as sequential frames with the harmonics of a given frame having a pitch period that may or may not be the same as the pitch period of another frame. According to the decoding method, data arrays respectively containing amplitude data and phase data of the harmonics are zero-padded to provide the arrays with a pre-set number of elements. Inverse orthogonal tarnsformation of the data arrays produces time domain information used to generate a time domain waveform signal for restoring the encoded speech signals. The different pitch periods of the frames are normalized to each other either by smooth (continuous) or acute (discontinuous) interpolation depending on the degree of change in the pitch period between the frames.
摘要:
A signal recording and reproducing apparatus includes an encoder encoding an input signal to produce a first group of encoded data, and a second group of encoded data used for reproducing a signal of higher quality than a signal resulting from decoding of the first group of encoded data, a recording unit recording record-data, including the first group and the second group of encoded data, into a recording medium, a reproducing unit reproducing the record-data from the recording medium, a decoder decoding at least the first group of encoded data out of the record-data from the reproducing unit, and a controller controlling an operation of each part of the recording and reproducing apparatus, and the controller performs control so as to cause the recording unit to erase the second group of encoded data according to a command to increase the amount of free storage capacity of the recording medium.
摘要:
An audio signal processing method for repairing an anomalous state such as noise, a discontinuity, and a break of sound, comprising detecting the anomalous state of an audio signal, deleting the audio signal in the anomalous segment, deducing the correct audio signal by referring to the waveform of the audio signal before and after the deleted segment, generating a repair signal for repairing the signal in the deleted segment based on the deduced result, inserting the repair signal into the deleted segment, and connecting it to the audio signal before and after the deleted segment.
摘要:
An audio signal processing method for repairing an anomalous state such as noise, a discontinuity, and a break of sound, comprising detecting the anomalous state of an audio signal, deleting the audio signal in the anomalous segment, deducing the correct audio signal by referring to the waveform of the audio signal before and after the deleted segment, generating a repair signal for repairing the signal in the deleted segment based on the deduced result, inserting the repair signal into the deleted segment, and connecting it to the audio signal before and after the deleted segment.
摘要:
The processing volume in calculating a weight value for perceptually weighted vector quantization is decreased to speed up the processing or to minimize hardware. To this end, an inverted LPC finds LPC (linear prediction coding) residuals of an input speech signal which are processed with sinusoidal analysis encoding by a sinusoidal analysis encoding unit. The resulting parameters are processed by a vector quantizer with perceptually weighted vector quantization. For this perceptually weighted vector quantization, the weight value is calculated based on results of an orthogonal transform of parameters derived from the impulse response of the transfer function of the weight.
摘要:
A method and apparatus for voiced/unvoiced decision for judging whether an input speech signal is voiced or unvoiced. The input parameters for performing the voiced/unvoiced (V/UV) decision are comprehensively judged in order to enable high-precision V/UV decision by a simplified algorithm. Parameters for the voiced/unvoiced (V/UV) decision include the frame-averaged energy of the input speech signal lev, the normalized autocorrelation peak value r0r, the spectral similarity degree pos, the number of zero crossings nZero, and the pitch lag pch. If these parameters are denoted by x, these parameters are converted by function calculation circuits using a sigmoid function g(x) represented byg(x)=A/(1+exp (-(x-b)/a))where A, a, and b are constants differing with each input parameter. Using the parameters converted by this sigmoid function g(x), the voiced/unvoiced decision is made a V/UV decision circuit.
摘要:
Nasalized sound effects during reproduction of low-pitch sounds are suppressed to produce playback sounds of high clarity. Amplitude data is processed with high range formant emphasis of crests and valleys of the envelope of the frequency spectrum on the high frequency range and with deepening of the valley of the frequency spectrum over the entire frequency range, above all, over the low to mid frequency range. Next, the amplitude data is processed for emphasizing the peak values of the formant of the voiced frame in the portion of the speech signal which is rising in magnitude and for unconditionally emphasizing the spectral envelope on the high frequency range. The voiced speech spectrum is generated by synthesizing the cosine wave based upon the emphasized amplitude data.
摘要:
A method and apparatus for reproducing speech signals at a controlled speed and for synthesizing speech includes a dividing unit that divides the input speech into time segments and an encoding unit that discriminates whether each of the speech segments is voiced or unvoiced. Based on the results of the discrimination, the encoding unit performs sinusoidal synthesis and encoding for voiced segments and vector quantization by closed-loop search for an optimum vector using an analysis-by-synthesis method for unvoiced segments in order to find encoded parameters. A period modification unit modifies the length of time associated with each signal segment and calculates a set of modified encoded parameters. In the speech synthesizing unit, encoded speech signal data is output from the encoding unit and pitch data and amplitude data specifying the spectral envelope are sent via a data conversion unit to a waveform synthesis unit, where the number of amplitude data points of the spectral envelope is changed without changing the shape of the spectral envelope, so that the pitch of the signal may be varied without changing its phoneme. A waveform synthesis unit synthesizes the speech waveform based on the converted spectral envelope data and pitch data.
摘要:
An encoding apparatus in which an input speech signal is divided into blocks and encoded in units of blocks. The encoding apparatus includes an encoding unit for performing CELP encoding having a noise codebook memory containing having codebook vectors generated by clipping Gaussian noise and codebook vectors obtained by learning using the code vectors generated by clipping the Gaussian noise as initial values. The encoding apparatus enables optimum encoding for a variety of speech configurations.