摘要:
Disclosed is a method for determining alcohol consumption capable of analyzing alcohol consumption in a time domain by analyzing a formant slope of a voice signal, and a recording medium and a terminal for carrying out same. An terminal for determining whether a person is drunk comprises: a voice input unit for generating a voice frame by receiving a voice signal; a voiced/unvoiced sound analysis unit for determining whether a received voiced frame corresponds to a voiced sound; a formant frequency extraction unit for extracting a plurality of formant frequencies of the voice frame corresponding to the voiced sound; and an alcohol consumption determining unit for calculating a formant slope between the plurality of formant frequencies, and determining the state of alcohol consumption depending on the formant slope, thereby determining whether a person is drunk by analyzing the formant slope of an inputted voice.
摘要:
Method and apparatus for segmenting speech by detecting the pauses between the words and/or phrases, and to determine whether a particular time interval contains speech or non-speech, such as a pause.
摘要:
Provided are methods and systems for detecting the presence of a transient noise event in an audio stream using primarily or exclusively the incoming audio data. Such an approach offers improved temporal resolution and is computationally efficient. The methods and systems presented utilize some time-frequency representation of an audio signal as the basis in a predictive model in an attempt to find outlying transient noise events and interpret the true detection state as a Hidden Markov Model (HMM) to model temporal and frequency cohesion common amongst transient noise events.
摘要:
A method of renormalizing high-resolution oscillator peaks, extracted from windowed samples of an audio signal, is disclosed. Feature vectors are generated for which variations in both fundamental frequency and time duration of speech are substantially mitigated. The feature vectors may be aligned within a common coordinate space, free of those variations in frequency and time duration that occurs between speakers, and even over speech by a single speaker, to facilitate a simple and accurate determination of matches between those AFDVs generated from a sample of the audio signal and corpus AFDVs generated for known speech at the phoneme and sub-phoneme level. The renormalized feature vectors can be combined with traditional feature vectors such as MFCCs, or they can be used exclusively to identify voiced, semi-voiced and unvoiced sounds.
摘要:
An exemplary sound processor 1) identifies at least one frequency bin, included in a plurality of frequency bins included in a frequency spectrum of an audio signal that is presented to a cochlear implant patient, that contains spectral energy above a modified spectral envelope, 2) identifies each frequency bin that contains spectral energy below the modified spectral envelope, 3) enhances the spectral energy contained in the at least one frequency bin identified as containing spectral energy above the modified spectral envelope, and 4) compresses the spectral energy contained in each frequency bin identified as containing spectral energy below the modified spectral envelope.
摘要:
An electronic apparatus for automatically acquiring and revising minutes of a meeting and a method thereof includes the steps of identifying one or more speakers from audio signals which are recorded during a meeting, based on pre-sampled audio signals and a voice feature table stored in a non-transitory storage medium. The audio signals are converted to text and divided into paragraphs, one paragraph being attributable to one speaker, and each speaker has a given user name. An original minutes of the meeting, based on the text and a meeting minutes template stored in the non-transitory storage medium, is prepared and revised and issued to all relevant persons.
摘要:
A computing device for automatically acquiring and revising minutes of a meeting and a method thereof includes the steps of: identifying one or more silences or notional silences (unvoiced segments) in voice data; determining a segment as being a satisfactory unvoiced segment if the gap of silence lasts for a time period equal to or larger than a predetermined period; dividing the audio data or text representing the audio data into one or more passages of text according to the satisfactory unvoiced segment, and creating an original minutes of the meeting according to the audio data or the representative text being divided into passages and a meeting minutes template stored in the non-transitory storage medium.
摘要:
The present disclosure is directed towards a method for speech intelligibility. The method may include receiving, at one or more computing devices, a first speech input from a first user and performing voice activity detection upon the first speech input. The method may also include analyzing a spectral tilt associated with the first speech input, wherein analyzing includes computing an impulse response of a linear predictive coding (“LPC”) synthesis filter in a linear pulse code modulation (“PCM”) domain and wherein the one or more computing devices includes an adaptive high pass filter configured to recalculate one or more linear prediction coefficients.
摘要:
The present disclosure envisages a computer implemented system for identifying significant speech frames within speech signals for facilitating speech recognition. The system receives an input speech signal having a plurality of feature vectors which is passed through a spectrum analyzer. The spectrum analyzer divides the input speech signal into a plurality of speech frames and computes a spectral magnitude of each of the speech frames. There is provided a suitability engine which is enabled to compute a suitability measure for each of the speech frames corresponding to spectral flatness measure (SFM), energy normalized variance (ENV), entropy, signal-to-noise ratio (SNR) and similarity measure. The suitability engine further computes a weighted suitability measure for each of the speech frames.
摘要:
A method for signal level matching by an electronic device is described. The method includes capturing a plurality of audio signals from a plurality of microphones. The method also includes determining a difference signal based on an inter-microphone subtraction. The difference signal includes multiple harmonics. The method also includes determining whether a harmonicity of the difference signal exceeds a harmonicity threshold. The method also includes preserving the harmonics to determine an envelope. The method further applies the envelope to a noise-suppressed signal.