摘要:
A mixed audio separation system (100) which separates a specific audio from among a mixed audio (S100) includes a local frequency information generation unit (105) which obtains pieces of local frequency information (S103) corresponding to local reference waveforms (S102), based on the local reference waveforms (S102) and an analysis waveform which is the waveform of the mixed audio (S100). Each of the local reference waveforms (S102) (i) constitutes a part of a reference waveform for analyzing a predetermined frequency, (ii) has a predetermined temporal/spatial resolution and (iii) includes at least one of an amplification spectrum and a phase spectrum in the predetermined frequency. The system includes: a specific audio's frequency feature value extraction unit (106) which performs pattern matching between a first set which is the pieces of local frequency information and a second set of pieces of frequency information (S103) of a predetermined specific audio, and extracts the first set of the pieces of local frequency information (S103), based on a result of the pattern matching; and an audio signal generation unit which generates a signal of the specific audio, based on the first set of the pieces of local frequency information (S103) extracted by the specific audio's frequency feature value extraction unit.
摘要:
A sound identification apparatus which reduces the chance of a drop in the identification rate, including: a frame sound feature extraction unit which extracts a sound feature per frame of an inputted audio signal; a frame likelihood calculation unit which calculates a frame likelihood of the sound feature in each frame, for each of a plurality of sound models; a confidence measure judgment unit which judges a confidence measure based on the frame likelihood; a cumulative likelihood output unit time determination unit which determines a cumulative likelihood output unit time based on the confidence measure; a cumulative likelihood calculation unit which calculates a cumulative likelihood in which the frame likelihoods of the frames included in the cumulative likelihood output unit time are cumulated, for each sound model; a sound type candidate judgment unit which determines, for each cumulative likelihood output unit time, a sound type corresponding to the sound model that has a maximum cumulative likelihood; a sound type frequency calculation unit which calculates the frequency of the sound type candidate; and a sound type interval determination unit which determines the sound type of the inputted audio signal and the interval of the sound type, based on the frequency of the sound type.
摘要:
A target sound analysis apparatus capable of distinguishing between a sound having the same fundamental period as a target sound but which differs therefrom and the target sound and analyzing whether or not the target sound is contained in an evaluation sound is an target sound analysis apparatus that analyzes whether or not a target sound is included in an evaluation sound, and includes: a target sound preparation unit that prepares a target sound that is an analysis waveform to be used for analyzing a fundamental period; an evaluation sound preparation unit that prepares an evaluation sound that is an analyzed waveform in which its fundamental period will be analyzed; and an analysis unit that temporally shifts the target sound with respect to the evaluation sound to sequentially calculate differential values of the evaluation sound and the target sound at corresponding points in time, calculate an iterative interval between the points in time where the differential value is equal to or lower than a predetermined threshold value, and judge whether or not the target sound exists in the evaluation sound based on a period of the iterative interval and the fundamental period of the target sound.
摘要:
An audio restoration apparatus which restores an audio to be restored having a missing audio part and being included in a mixed audio. The audio restoration apparatus includes: a mixed audio separation unit which extracts the audio to be restored included in the mixed audio; an audio structure analysis unit which generates at least one of a phoneme sequence, a character sequence and a musical note sequence of the missing audio part in the extracted audio to be restored, based on an audio structure knowledge database in which semantics of audio are registered; an unchanged audio characteristic domain analysis unit which segments the extracted audio to be restored into time domains in each of which an audio characteristic remains unchanged; an audio characteristic extraction unit which identifies a time domain where the missing audio part is located, from among the segmented time domains, and extract audio characteristics of the identified time domain in the audio to be restored; and an audio restoration unit which restores the missing audio part in the audio to be restored, using the extracted audio characteristics and the generated one or more of phoneme sequence, character sequence and musical note sequence.
摘要:
A target sound analysis apparatus capable of distinguishing between a sound having the same fundamental period as a target sound but which differs therefrom and the target sound and analyzing whether or not the target sound is contained in an evaluation sound is an target sound analysis apparatus that analyzes whether or not a target sound is included in an evaluation sound, and includes: a target sound preparation unit that prepares a target sound that is an analysis waveform to be used for analyzing a fundamental period; an evaluation sound preparation unit that prepares an evaluation sound that is an analyzed waveform in which its fundamental period will be analyzed; and an analysis unit that temporally shifts the target sound with respect to the evaluation sound to sequentially calculate differential values of the evaluation sound and the target sound at corresponding points in time, calculate an iterative interval between the points in time where the differential value is equal to or lower than a predetermined threshold value, and judge whether or not the target sound exists in the evaluation sound based on a period of the iterative interval and the fundamental period of the target sound.
摘要:
An audio restoration apparatus is provided which restores an audio to be restored having a missing audio part and being included in a mixed audio. The audio restoration apparatus includes: a mixed audio separation unit which extracts the audio to be restored included in the mixed audio; an audio structure analysis unit which generates at least one of a phoneme sequence, a character sequence and a musical note sequence of the missing audio part; an unchanged audio characteristic domain analysis unit which segments the extracted audio to be restored into time domains in each of which an audio characteristic remains unchanged; an audio characteristic extraction unit which identifies a time domain where the missing audio part is located, and extracts audio characteristics of the identified time domain in the audio to be restored; and an audio restoration unit which restores the missing audio part in the audio to be restored.
摘要:
A sound identification apparatus which reduces the chance of a drop in the identification rate, including: a frame sound feature extraction unit which extracts a sound feature per frame of an inputted audio signal; a frame likelihood calculation unit which calculates a frame likelihood of the sound feature in each frame, for each of a plurality of sound models; a confidence measure judgment unit which judges a confidence measure based on the frame likelihood; a cumulative likelihood output unit time determination unit which determines a cumulative likelihood output unit time based on the confidence measure; a cumulative likelihood calculation unit which calculates a cumulative likelihood in which the frame likelihoods of the frames included in the cumulative likelihood output unit time are cumulated, for each sound model; a sound type candidate judgment unit which determines, for each cumulative likelihood output unit time, a sound type corresponding to the sound model that has a maximum cumulative likelihood; a sound type frequency calculation unit which calculates the frequency of the sound type candidate; and a sound type interval determination unit which determines the sound type of the inputted audio signal and the interval of the sound type, based on the frequency of the sound type.
摘要:
A mixed audio separation system (100) which separates a specific audio from among a mixed audio (S100) includes a local frequency information generation unit (105) which obtains pieces of local frequency information (S103) corresponding to local reference waveforms (S102), based on the local reference waveforms (S102) and an analysis waveform which is the waveform of the mixed audio (S100). Each of the local reference waveforms (S102) (i) constitutes a part of a reference waveform for analyzing a predetermined frequency, (ii) has a predetermined temporal/spatial resolution and (iii) includes at least one of an amplification spectrum and a phase spectrum in the predetermined frequency. The system includes: a specific audio's frequency feature value extraction unit (106) which performs pattern matching between a first set which is the pieces of local frequency information and a second set of pieces of frequency information (S103) of a predetermined specific audio, and extracts the first set of the pieces of local frequency information (S103), based on a result of the pattern matching; and an audio signal generation unit which generates a signal of the specific audio, based on the first set of the pieces of local frequency information (S103) extracted by the specific audio's frequency feature value extraction unit.
摘要:
A sound source direction detector comprises FFT analysis sections (103(1) to 103(3)) for generating a frequency spectrum in at least one frequency band of acoustic signals for each of the acoustic signals collected by two or more microphones arranged apart from one another, detection sound identifying sections (104(1) to 104(3)) for identifying a time portion of the frequency spectrum of a detection sound which obtains a sound source direction from the frequency spectrum in the frequency band, and a direction detecting section (105) for obtaining the difference between the times at which the detection sound reaches the microphones, obtaining the sound source direction from the time difference, the distance between the microphones, and the sound velocity, and outputting it depending on the degree of coincidence between the microphones of the frequency spectrum in the time portion identified by the detection sound identifying sections (104(1) to 104(3)) in a time interval which is the time unit to detect the sound source direction.
摘要:
To provide a speech recognition apparatus which appropriately performs speech recognition by generating, in real time, language models adapted to a new topic even in the case where topics are changed. The speech recognition apparatus includes: a word specification unit for obtaining and specifying a word; a language model information storage unit for storing language models for recognizing speech and the respectively corresponding pieces of tag information; a combination coefficient calculation unit for calculating the weights of the respective language models, as combination coefficients, according to the word obtained by the word specification unit, based on the relevance degree between the word obtained by the word specification unit and the tag information of each language model; a language probability calculation unit for calculating the probabilities of word appearance by combining the respective language models according to the calculated combination coefficients; and a speech recognition unit for recognizing speech using the calculated probabilities of word appearance.