Abstract:
A terminal using machine learning for selecting an output mode based on the context information of a user. An embodiment of a terminal may include an audio output unit, a display, and a controller configured to obtain context information of a user, set an output mode of the mobile terminal based on the obtained context information, convert communication information of a first type received from an external device to a second type associated with the set output mode when the first type and the second type are different, and control the audio output unit or the display to output the communication information, wherein the audio output unit or the display is used to output the communication information based on the output mode. An embodiment may include a data learning unit configured to store data to implement machine learning and logic based determinations for selecting the output mode.
Abstract:
In accordance with an example embodiment of the invention, there is provided an apparatus for detecting voice activity in an audio signal. The apparatus comprises a first voice activity detector (6b) for making a first voice activity detection decision (D2) based at least in part on the voice activity of a first audio signal (A1) received from a first microphone (1a). The apparatus also comprises a second voice activity detector (6a) for making a second voice activity detection decision (D1) based at least in part on an estimate of a direction of the first audio signal (A1) and an estimate of a direction of a second audio signal (A2) received from a second microphone (1b). The apparatus further comprises a classifier (6c) for making a third voice activity detection decision (D3) based at least in part on the first and second voice activity detection decisions.
Abstract:
A personal audio device includes a personal audio device housing; a transducer mounted on the housing for reproducing an audio signal including an anti-noise signal (anti-noise) for countering effects of ambient audio sounds in an acoustic output of the output transducer; a reference microphone mounted on the housing for providing a reference microphone signal (ref) indicative of the ambient audio sounds; and a processing circuit (30) within the housing. The processing circuit (30) adaptively generates the anti-noise signal (anti-noise) from the reference microphone signal (ref) such that the anti-noise signal (anti-noise) causes substantial cancellation of the ambient audio sounds. The processing circuit (30), in response to determining that an amplitude of acoustic leakage of the source audio into the reference microphone is substantial with respect to an amplitude of the ambient audio sounds, takes action to prevent improper generation of the anti-noise signal (anti-noise).
Abstract:
One method of operation includes beamforming a plurality of microphone outputs to obtain a plurality of virtual microphone audio channels. Each virtual microphone audio channel corresponds to a beamform. The virtual microphone audio channels include at least one voice channel and at least one noise channel. The method includes performing voice activity detection on the at least one voice channel and adjusting a corresponding voice beamform until voice activity detection indicates that voice is present on the at least one voice channel. Another method beamforms the plurality of microphone outputs to obtain a plurality of virtual microphone audio channels, where each virtual microphone audio channel corresponds to a beamform, and with at least one voice channel and at least one noise channel. The method performs voice recognition on the at least one voice channel and adjusts the corresponding voice beamform to improve a voice recognition confidence metric.
Abstract:
In one example a controller comprises logic, at least partially including hardware logic, configured to detect speech activity in an audio signal received in a non-aerial microphone and in response to the voice activity, to apply a noise cancellation algorithm to a speech input received in a aerial microphone. Other examples may be described.
Abstract:
In a system and method for maintaining the spatial stability of a sound field a balance gain may be calculated for two or more microphone signals. The balance gain may be associated with a spatial image in the sound field. Signal values may be calculated for each of the microphone. The signal values may be signal estimates or signal gains calculated to improve a characteristic of the microphone signals. The differences between the signal values associated with each microphone signal may be limited although some difference between signal values may be allowable. One or more microphone signals are adjusted responsive to the two or more balance gains and the signal gains to maintain the spatial stability of the sound field. The adjustments of one or more microphone signals may include mixing of two or more microphone. The signal gains are applied to the two or more microphone signals.
Abstract:
[Object] To provide a speech intelligibility improving apparatus capable of generating highly intelligible speech in various environments without unnecessarily amplifying sound volume. [Solution] A speech intelligibility improving apparatus 250 includes: an envelope surface extracting unit 292 extracting, from a spectrum of speech signal 254 as an object of processing, a curve representing a general outline of peaks of spectral envelope in contact with or along local peaks of spectral envelope of the spectrum; a noise adapting unit 300 modifying spectrum of speech signal 254 based on the curve extracted by envelope surface extracting unit 292; and a sinusoidal wave speech synthesizing unit 305 generating a modified speech signal 260 for the speech improved in intelligibility based on the spectrum modified by noise adapting unit 300.
Abstract:
An audio signal processing device that includes: a processor configured to execute a procedure, the procedure comprising: detecting a speech segment of an audio signal; suppressing noise in the audio signal; and adjusting an amount of suppression of noise such that the amount of suppression during a specific period, which starts from a position based on a terminal end of the detected speech segment and is a period shorter than a period spanning from the terminal end of the detected speech segment to a starting end of a next speech segment, becomes greater than in other segments, and a memory configured to store audio signals before and after noise suppression and the amount of suppression before and after adjustment.
Abstract:
A noise suppressor comprises a first (401) and a second transformer (403) for generating a first and second frequency domain signal from a frequency transform of a first and second microphone signal. A gain unit (405, 407, 409) determines time frequency tile gains in response to a difference measure for magnitude time frequency tile values of the first frequency domain signal and magnitude time frequency tile values of the second frequency domain signal. A scaler (411) generates a third frequency domain signal by scaling time frequency tile values of the first frequency domain signal by the time frequency tile gains; and the resulting signal is converted to the time domain by a third transformer (413). A designator (405, 407, 415) designates time frequency tiles of the first frequency domain signal as speech tiles or noise tiles; and the gain unit (409) determines the gains in response to the designation of the time frequency tiles as speech tiles or noise tiles.