摘要:
The present document discloses a method and apparatus for compensating for a lost frame in a transform domain, comprising: calculating frequency-domain coefficients of a current lost frame using frequency-domain coefficients of one or more frames prior to the current lost frame, and performing frequency-time transform to obtain an initially compensated signal; and performing waveform adjustment, to obtain a compensated signal. Alternatively, extrapolation is performed for all or part of frequency points of the current lost frame using phases and amplitudes of corresponding frequency points of a plurality of previous frames to obtain phases and amplitudes of the corresponding frequency points of the current lost frame, to obtain frequency-domain coefficients of the corresponding frequency points, and frequency-time transform is performed to obtain a compensated signal. The above methods can be selected through a judgment algorithm to compensate for the current lost frame, thereby achieving a better compensation effect.
摘要:
Audio encoding methods/terminals, audio decoding methods/terminals, and audio codec systems are provided. A plurality of audio signals that are continuous is obtained. it is determined whether each audio signal of the plurality of audio signals includes a designated signal type, according to an audio parameter of each audio signal. A marked audio encoding stream is obtained by performing a marking to each audio signal as having or not having the designated signal type. The marking is used, at a decoding terminal, to perform an enhancement-process to one or more audio signals having the designated signal type. The enhancement-process is not performed to audio signals that do not have the designated signal type.
摘要:
The disclosure relates to an audio classifier comprising: a first processor having hard-wired logic configured to receive an audio signal and detect audio activity from the audio signal; and a second processor having reconfigurable logic configured to classify the audio signal as a type of audio signal in response to the first processor detecting audio activity.
摘要:
A low power sound recognition sensor is configured to receive an analog signal that may contain a signature sound. Sparse sound parameter information is extracted from the analog signal and compared to a sound parameter reference stored locally with the sound recognition sensor to detect when the signature sound is received in the analog signal. A portion of the sparse sound parameter information is differential zero crossing (ZC) counts. Differential ZC rate may be determined by measuring a number of times the analog signal crosses a threshold value during each of a sequence of time frames to form a sequence of ZC counts and taking a difference between selected pairs of ZC counts to form a sequence of differential ZC counts.
摘要:
A speech recognition apparatus and a speech recognition method are provided. In the invention, whether an original voice sampling signal corresponding to a target voice frame is a noise signal is determined according to a ratio of an energy of a first consonant frequency band signal to an energy of a second consonant frequency band signal, a ratio of an energy of the first consonant frequency band signal to an energy of the original voice sampling signal and a ratio of an energy of the second consonant frequency band signal to an energy of the original voice sampling signal.
摘要:
An example method to determine a quality-of-experience (QoE) metric for a network communication includes receiving a media signal from the network communication, wherein the media signal includes a voice component, extracting an experience indicator from the voice component, wherein the experience indicator is a voice feature descriptive of a service quality of the network communication, evaluating the experience indicator, retrieving a quality-of-service (QoS) metric if the evaluated experience indicator reflects the service quality of the network possibly being subpar, and determining the QoE metric for the network communication based on the evaluated experience indicator and the retrieved QoS metric for the network communication.
摘要:
A speech detection apparatus and method are provided. The speech detection apparatus and method determine whether a frame is speech or not using feature information extracted from an input signal. The speech detection apparatus may estimate a situation related to an input frame and determine which feature information is required for speech detection for the input frame in the estimated situation. The speech detection apparatus may detect a speech signal using dynamic feature information that may be more suitable to the situation of a particular frame, instead of using the same feature information for each and every frame.
摘要:
A game apparatus includes an operating switch and a microphone. A player operates a player object through intuition by the operating switch or inputting a sound. The number of zero crossings contained in waveform of a sound input through the microphone is detected, and also individual interval times between the zero crossings are detected. Then, it is determined whether or not the distribution of the interval times, i.e. the frequency distribution matches the distribution of interval times (frequency distribution) related to a breath sound stored in advance. If there is a match between the two, the input sound is recognized as a breath sound, and a game process based on the breath (wind) is carried out. For example, a game screen depicting the breath or wind is displayed on an LCD.
摘要:
In a sound signal processing apparatus, a frame information generation section generates frame information of each frame of a sound signal. A storage stores the frame information generated by the frame information generation section. A first interval determination section determines a first utterance interval in the sound signal. A second interval determination section determines a second utterance interval based on the frame information of the first utterance interval stored in the storage such that the second utterance interval is made shorter than the first utterance interval and confined within the first utterance interval by trimming frames from either of a start point or an end point of the first utterance interval.
摘要:
A game apparatus includes an operating switch and a microphone. A player operates a player object through intuition by the operating switch or inputting a sound. The number of zero crossings contained in waveform of a sound input through the microphone is detected, and also individual interval times between the zero crossings are detected. Then, it is determined whether or not the distribution of the interval times, i.e. the frequency distribution matches the distribution of interval times (frequency distribution) related to a breath sound stored in advance. If there is a match between the two, the input sound is recognized as a breath sound, and a game process based on the breath (wind) is carried out. For example, a game screen depicting the breath or wind is displayed on an LCD.