摘要:
The present document discloses a method and apparatus for compensating for a lost frame in a transform domain, comprising: calculating frequency-domain coefficients of a current lost frame using frequency-domain coefficients of one or more frames prior to the current lost frame, and performing frequency-time transform to obtain an initially compensated signal; and performing waveform adjustment, to obtain a compensated signal. Alternatively, extrapolation is performed for all or part of frequency points of the current lost frame using phases and amplitudes of corresponding frequency points of a plurality of previous frames to obtain phases and amplitudes of the corresponding frequency points of the current lost frame, to obtain frequency-domain coefficients of the corresponding frequency points, and frequency-time transform is performed to obtain a compensated signal. The above methods can be selected through a judgment algorithm to compensate for the current lost frame, thereby achieving a better compensation effect.
摘要:
Methods and arrangements involving electronic devices, such as smartphones, tablet computers, wearable devices, etc., are disclosed. One arrangement involves a low-power processing technique for discerning cues from audio input. Another involves a technique for detecting audio activity based on the Kullback-Liebler divergence (KLD) (or a modified version thereof) of the audio input. Still other arrangements concern techniques for managing the manner in which policies are embodied on an electronic device. Others relate to distributed computing techniques. A great variety of other features are also detailed.
摘要:
What is disclosed is a system and method for determining when a subject is speaking from a respiratory signal obtained from a video of that subject. A video of a subject is received and a respiratory signal is extracted from a time-series signal is obtained from processing pixels in image frames of the video. The respiratory signal comprises an inspiratory signal and an expiratory signal. Cycle-level feature are extracted from the respiratory signal and used to identify expiratory signals during which speech is likely to have occurred. The identified expiratory signal are divided into time intervals. Frame-level features are determined for each time interval and an amount of distortion in the expiratory signal for this time interval is quantified. The amount of distortion is compared to a threshold. In response to the comparison, a determination is made that speech occurred during this interval. The process repeats for all time intervals.
摘要:
Method and apparatus for segmenting speech by detecting the pauses between the words and/or phrases, and to determine whether a particular time interval contains speech or non-speech, such as a pause.
摘要:
System(s) and method(s) for classifying noise data of human crowd are disclosed. Noise data is captured from one or more sources and features are extracted by using computation techniques. The features comprise spectral domain features and time domain features. Classification models are developed by using each of the spectral domain features and the time domain features. Discriminative information with respect to the noise data is extracted by using the classification models. A performance matrix is computed for each of the classification model. The performance matrix comprises classified noise elements with respect to the noise data. Each classified noise element is associated with a classification performance score with respect to a spectral domain feature, a time domain feature, and fusion of features and scores. The classified noise elements provide the classification of the noise data.
摘要:
A low power sound recognition sensor is configured to receive an analog signal that may contain a signature sound. Sparse sound parameter information is extracted from the analog signal. The extracted sparse sound parameter information is processed using a speaker dependent sound signature database stored in the sound recognition sensor to identify sounds or speech contained in the analog signal. The sound signature database may include several user enrollments for a sound command each representing an entire word or multiword phrase. The extracted sparse sound parameter information may be compared to the multiple user enrolled signatures using cosine distance, Euclidean distance, correlation distance, etc., for example.
摘要:
A speech recognition apparatus and a speech recognition method are provided. In the invention, whether an original voice sampling signal corresponding to a target voice frame is a consonant signal is determined according to at least one of a ratio of an energy of a low-pass sampling signal to an energy of the original voice sampling signal and a ratio value of an energy of a second consonant frequency band signal.
摘要:
The present invention addresses the problems of enabling a process of converting voice data playback speed even in a voice data playback device alone. The solution is a voice data playback speed conversion method and a voice data playback speed conversion device, comprising: a step of setting a reference zero cross point from any arbitrary zero cross point; a step of selecting a zero cross point temporally after the reference zero cross point within a first predetermined time range; a step of calculating a reference correlation function in a waveform from the reference zero cross point until a second predetermined time; and a step of calculating a correlation function in a waveform from a plurality of previously selected zero cross points until the second predetermined time, wherein a second reference zero cross point is the zero cross point of the waveform having a correlation function in which a concordance rate of the correlation value between the reference correlation function and the correlation function is the highest value, the difference between the reference zero cross point and the second reference zero cross point is calculated as a basic cycle, and the expansion and contraction of voice data is executed in basic cycle units so as to perform a process of converting the playback speed of the voice data.
摘要:
Audio encoding methods/terminals, audio decoding methods/terminals, and audio codec systems are provided. A plurality of audio signals that are continuous is obtained. It is determined whether each audio signal of the plurality of audio signals includes a designated signal type, according to an audio parameter of each audio signal. A marked audio encoding stream is obtained by performing a marking to each audio signal as having or not having the designated signal type. The marking is used, at a decoding terminal, to perform an enhancement-process to one or more audio signals having the designated signal type. The enhancement-process is not performed to audio signals that do not have the designated signal type.
摘要:
The present disclosure provides a audio signal processing apparatus including, an amplitude detector configured to detect a noise start point of an audio signal including a noise signal by comparing an amplitude value of the audio signal with a threshold value, a frequency feature calculator configured to calculate a frequency feature representing at least a frequency characteristic of the audio signal after the noise start point, and a noise determiner configured to determine a leg continuously including high-frequency components equal to or higher than a reference frequency in the audio signal after the noise start point as a noise leg based on the frequency feature.