摘要:
In a distributed speech recognition system comprising a first communication device which receives a speech input (34), encodes data representative of the speech input, and transmits the encoded data and a second remotely-located communication device which receives the encoded data and compares the encoded data with a known data set, the device including a processor with a program which controls the processor to operate according to a method of reconstructing the speech input including the step of receiving encoded data including encoded spectral data and encoded energy data. The method further includes the step of decoding the encoded spectral data and encoded energy data to determine the spectral data and energy data. The method also includes the step of combining the spectral data and energy data to reconstruct the speech input.
摘要:
An improved speech coder provides a more natural sounding replication of speech by modifying the mean-squared error criterion for the selected speech coder parameters. Specifically, the modification emphasizes the signal components that the speech coder has difficulty matching, i.e. the high frequencies. This emphasis is constrained to certain limitations to avoid over-emphasizing the speech.
摘要:
A speech encoder uses a soft interpolation decision for spectral parameters. For each frame, the encoder first calculates the residual energy for interpolated spectral parameters, and then calculates the residual energy for non-interpolated spectral parameters. The encoder then compares these residual energy calculations. If the encoder determines that the interpolated spectral parameters yields the lowest residual energy, it indicates to a far-end decoder to use the interpolated values for the current frame. Otherwise, it indicates to the far-end decoder to use the non-interpolated values for the current frame. The encoder signals the far-end decoder as to which spectral parameters (interpolated or non-interpolated values) to use by encoding and transmitting a special signalling bit.
摘要:
Lag information for use in a speech coder is developed by estimating lag values for the various subframes (201) of a speech coding frame (200) of information, and by then selecting lag values for each subframe that are both closely corresponding to the estimated lag values and that also observe the restrictions of a selected delta-coding routine. When a plurality of candidate sets of such information have been developed, they are compared against one another to identify that set which appears to provide the best set of lag values. This information is then available for framing and transmission. In one embodiment, the sets of candidate values are also selected to ensure provision for subsequent adjustment in either a positive or negative direction. With this adjustability capability so provided, closed-loop adjustments can be made with respect to the selected values to ensure that the ultimately transmitted coding for the lag value most closely corresponds to an ultimate output that most closely represents the speech signal to be represented.