摘要:
A sound processing apparatus includes a first calculator that calculates first power based on a first signal received by a first microphone that is among the first microphone and a second microphone; a second calculator that calculates second power based on a second signal received by the second microphone; a gain calculator that calculates a gain on the basis of the ratio of the first power to the second power; and a multiplier that processes the second signal using the gain calculated by the gain calculator.
摘要:
A sound processing device includes: a plurality of sound input units; a detecting unit for detecting a frequency component of each sound input to the plurality of sound signal unit, the each sound arriving from a direction approximately perpendicular to a line determined by arrangement positions of two sound input units among the plurality of sound input units; a correction coefficient unit for obtaining a correction coefficient for correcting a level of at least one of the sound signals generated from the input sounds by the two sound input units so as to match the levels of the sound signals with each other based on the sound of the detected frequency component; a correcting unit for correcting the level of at least one of the sound signals using the obtained correction coefficient; and a processing unit for performing a sound process based on the sound signal with the corrected level.
摘要:
A state detecting apparatus includes: a processor to execute acquiring utterance data related to uttered speech, computing a plurality of statistical quantities for feature parameters regarding features of the utterance data, creating, on the basis of the plurality of statistical quantities regarding the utterance data and another plurality of statistical quantities regarding reference utterance data based on other uttered speech, pseudo-utterance data having at least one statistical quantity equal to a statistical quantity in the other plurality of statistical quantities, computing a plurality of statistical quantities for synthetic utterance data synthesized on the basis of the pseudo-utterance data and the utterance data, and determining, on the basis of a comparison between statistical quantities of the synthetic utterance data and statistical quantities of the reference utterance data, whether the speaker who produced the uttered speech is in a first state or a second state; and a memory.
摘要:
A microphone array device includes a first sound reception unit configured to obtain a first sound signal that is input from a first microphone, a second sound reception unit configured to obtain a second sound signal that is input from a second microphone, a noise state evaluation unit configured to compare the first sound signal and the second sound signal and to obtain an evaluation parameter to evaluate an influence of a non-target sound included in the second sound signal on a target sound included in the first sound signal according to a result of the comparison, a subtraction adjustment unit configured to set a suppression amount for the second sound signal based on the evaluation parameter and to generate a third sound signal; and a subtraction unit configured to generate a signal to be output based on the third sound signal and the first sound signal.
摘要:
An utterance state detection device includes an user voice stream data input unit that gets user voice stream data of an user, a frequency element extraction unit that extracts high frequency elements by frequency-analyzing the user voice stream data, a fluctuation degree calculation unit that calculates a fluctuation degree of the high frequency elements thus extracted every unit time, a statistic calculation unit that calculates a statistic every certain interval based on a plurality of the fluctuation degrees in a certain period of time, and an utterance state detection unit that detects an utterance state of a specified user based on the statistic obtained from user voice stream data of the specified user.
摘要:
Provided are a speech recognition system, a method and a storage medium capable of, even in a case where plural speakers input superimposed speeches, recognizing a speech of an individual each speaker and making a single application program sharable among the speakers in execution. In a speech recognition system receiving speeches of plural speakers to execute a predetermined application program, the received speeches are separated according to the respective speakers if necessary, the received speeches of individual speakers are speech-recognized, results of speech recognition are matched with data items necessary for executing the application program, one of results of recognition of plural speeches which are found as a result of the matching to be overlapping is selected, and the results of recognition of plural speeches which are found as a result of the matching not to be overlapping are linked to the selected result of speech recognition.
摘要:
A signal processing apparatus includes: two sound input units, an orthogonal transformer to transform two sound signals input from the two sound input units into respective spectral signals in a frequency domain, a phase difference calculator to calculate a phase difference between the spectral signals in the frequency domain, a range determiner to determine a coefficient responsive to a frequency in the phase difference as a function of frequency, and determine a suppression range related to a phase on a per frequency basis of the frequency responsive to the coefficient; and a filter to phase-shift a component of one of the spectral signals on a per frequency basis in order to generate a phase-shifted spectral signal when the phase difference at each frequency falls within the suppression range, synthesizing the phase-shifted spectral signal and the other of the spectral signals in order to generate a filtered spectral signal.
摘要:
A sound processor includes a conversion unit converts a reference sound signal corresponding to a base of sound to be output and an observation sound signal based on each of sound signals output by a plurality of sound receiving units into frequency components, an echo suppression unit estimates echo derived from sound based on a converted reference sound signal and suppressing the estimated echo in a converted observation sound signal, a noise suppression unit estimates noise based on an arrival direction of sound and suppressing the estimated noise in the converted observation sound signal and an integrating process unit suppresses, with respect to each frequency component, echo and noise in the converted sound signal based on a observation sound signal obtained after echo suppression and a observation sound signal obtained after noise suppression.
摘要:
An echo suppressing system includes: a sound output device for outputting sound based on a sound signal, including a passing section for allowing passage of a component of a different frequency band, and a plurality of sound output sections, each of which outputs sound based on each of the plurality of sound signals passed through the passing section; a summer for summing the plurality of sound signals to generate a reference sound signal; a sound input device for converting input sound into a sound signal; and an echo suppressor for suppressing echo based on the sound output by the sound output device, including an input section to which a sound signal is input from the sound input device as an observation sound signal, and a correction section for correcting the observation sound signal so as to suppress echo included in the observation sound signal.
摘要:
A computer-readable medium recording a program allowing a computer to execute: setting a plurality of frames on a common time axis between a first waveform of an input to the audio processing and a second waveform of an output from the audio processing, detecting a voice frame and a noise frame in the first and second waveform, calculating a first and second spectrum from the first and second waveform, adjusting the level of the first or second spectrum of the noise frame, and setting the adjusted first and second spectrum of the noise frame as a third and fourth spectrum, calculating a distortion amount of the noise frame from the third and fourth spectrum, estimating a noise model spectrum from the first or second spectrum, and calculating a distortion amount of the voice frame from the first and second spectrum of the voice frame at the selected frequency.