Abstract:
An apparatus for detecting a sound in an acoustical environment includes a microphone array configured to detect an audio signal in the acoustical environment. The apparatus also includes a processor configured to determine an angular location of a sound source of the audio signal. The angular location is relative to the microphone array. The processor is also configured to determine at least one reverberation characteristic of the audio signal. The processor is further configured to determine a distance, relative to the microphone array, of the sound source along an axis associated with the angular location based on the at least one reverberation characteristic.
Abstract:
A method transforms a noisy audio signal to an enhanced audio signal, by first acquiring the noisy audio signal from an environment. The noisy audio signal is processed by an enhancement network having network parameters to jointly produce a magnitude mask and a phase estimate. Then, the magnitude mask and the phase estimate are used to obtain the enhanced audio signal.
Abstract:
음주 판별 방법은, 입력된 음성 신호의 복수의 유효 프레임을 검출하는 단계; 유효 프레임의 원신호의 차신호를 검출하는 단계; 각 유효 프레임마다 원신호의 평균 에너지 및 차신호의 평균 에너지를 검출하는 단계; 및 각 유효 프레임마다 원신호의 평균 에너지와 차신호의 평균 에너지 차이에 기초하여 음주 상태를 판단하는 단계를 포함한다. 이에 따라, 음성 신호를 이용한 차신호 에너지법에 의하여 원거리에 있는 운전자 또는 운항자의 음주 여부 및 정도를 파악할 수 있으므로, 음주 운전 또는 운항으로 인한 사고를 예방할 수 있다.
Abstract:
Volume leveler controller and controlling method are disclosed. In one embodiment, A volume leveler controller includes an audio content classifier for identifying the content type of an audio signal in real time; and an adjusting unit for adjusting a volume leveler in a continuous manner based on the content type as identified. The adjusting unit may configured to positively correlate the dynamic gain of the volume leveler with informative content types of the audio signal, and negatively correlate the dynamic gain of the volume leveler with interfering content types of the audio signal.
Abstract:
A system receives monaural sound which includes speech and background noises. The received sound is divided by frequency and time into time-frequency units (TFUs). Each TFU is classified as speech or non-speech by a processing unit. The processing unit for each frequency range includes at least one of a deep neural network (DNN) or a linear support vector machine (LSVM). The DNN extracts and classifies the features of the TFU and includes a pre- trained stack of Restricted Boltzmann Machines (RBM), and each RBM includes a visible and a hidden layer. The LSVM classifies each TFU based on extracted features from the DNN, including those from the visible layer of the first RBM, and those from the hidden layer of the last RBM in the stack. The LSVM and DNN include training with a plurality of training noises. Each TFU classified as speech is output.
Abstract:
A signal recognition process, including: receiving signal data representing a signal; filtering the signal data to generate filtered data representing signal amplitudes as a function of time and one or more other dimensions represented by the signal data; setting signal amplitudes exceeding a saturation threshold to a saturation value representing reinforcement; and applying lateral inhibition across each of the one or more other dimensions to generate, for each said other dimension, inhibitive signal amplitude values at values of said dimension flanking dominant ones of the signal amplitudes along said dimension.
Abstract:
A method for mimicking the auditory system's response to rhythm of an input signal having a time varying structure comprising the steps of receiving a time varying input signal x(t) to a network of n nonlinear oscillators, each oscillator having a different natural frequency of oscillation and obeying a dynamical equation of the form(the mathematic formula should be inserted here) wherein ? represents the response frequency, r is the amplitude of the oscillator and F is the phase of the oscillator. Generating at least one frequency output from said network useful for describing said varying structure.
Abstract:
The present invention is directed to systems and methods designed to ascertain the structure of acoustic signals. The approach involves an alternative transform of an acoustic input signal, utilizing a network of nonlinear oscillators in which each oscillator is tuned to a distinct frequency. Each oscillator receives input and interacts with the other oscillators in the network, yielding nonlinear resonances that are used to identify structure in an acoustic input signal. The output of the nonlinear frequency transform can be used as input to a system that will provide further analysis of the signal. According to one embodiment, the nonlinear responses are defined as a network of n expanded canonical oscillators Z i with an input, for each oscillator as a function of an external stimulus. In this way, the response of oscillators to inputs that are not close to its natural frequency are accounted for.
Abstract:
Apparatus and method for detecting a single source contribution in an input signal comprising contributions from more than one source. An input analysis device (3) receives the input signal, for providing a t ime - frequency representation of the input signal. A neural preprocessing device (5) is connected to the input analysis device (3), for separating a foreground signal from background signals in the time - frequency representation of the input signal. A featur e estimation device (7) is connected to the neural preprocessing device (5) for detecting specific features in the foreground signal. A model activation device (11) is connected to the feature estimation device (7) for activating one or more of a set of mo dels based on the detected specific features. A decision device (8) is connected to the model activation device (11) for monitoring the possible activation of a specific one of the models and generating an output based on the monitoring.