摘要:
A speech recognition device is provided which includes: a language model storage unit which stores a language model indicating appearance probabilities of words or word sequences; an acoustic feature amount extracting unit and a checking unit which extract a feature amount of an inputted speech signal, and identifies the word or word sequence corresponding to the speech signal by checking the extracted feature amount with the language model stored in the language model storage unit; an obtained word signal receiving/analyzing unit which obtains and analyzes the word; and a language model adjusting unit which identifies the appearance probability of the word based on the time elapsed after obtaining the word by the obtained word signal receiving/analyzing unit and which adjusts the language model by reflecting the identified appearance probability on the language model stored in the language model storage unit.
摘要:
An audio identifying device which can transmit with certainty audio information which is important for a user, according to an importance level of input audio information which varies depending on the action of the user includes: a checking unit 104 which judges a type of inputted audio; a user action obtainment unit 108 which detects an action of the user; an output mode determination unit 106 which determines an output mode of an audio identification result regarding the input audio by checking, with output mode definition information stored in the output mode definition information storage unit 107, the result judged by the checking unit 104 and the result detected by the user action obtainment unit 108; and the audio identification result output processing unit 110 which outputs the audio identification result on which processing according to the output mode determined by the audio identification result has been performed by checking the judgment result determined by the output mode determination unit 106 with the output processing method definition information stored in an output processing method definition information storage unit 111.
摘要:
An audio identifying device which can transmit with certainty audio information which is important for a user, according to an importance level of input audio information which varies depending on the action of the user includes: a checking unit 104 which judges a type of inputted audio; a user action obtainment unit 108 which detects an action of the user; an output mode determination unit 106 which determines an output mode of an audio identification result regarding the input audio by checking, with output mode definition information stored in the output mode definition information storage unit 107, the result judged by the checking unit 104 and the result detected by the user action obtainment unit 108; and the audio identification result output processing unit 110 which outputs the audio identification result on which processing according to the output mode determined by the audio identification result has been performed by checking the judgment result determined by the output mode determination unit 106 with the output processing method definition information stored in an output processing method definition information storage unit 111.
摘要:
Provided is a speech recognition device which appropriately applies limitations on target words to be recognized which are obtained from outside of the speech recognition device, as well as to eliminate the uncomfortable feeling caused by the limitation processing. The speech recognition device includes: a language model storage unit (104) which stores a language model indicating appearance probabilities of words or word sequences; an acoustic feature amount extracting unit (101) and a checking unit (102) which extract a feature amount of an inputted speech signal, and identifies the word or word sequence corresponding to the speech signal by checking the extracted feature amount with the language model stored in the language model storage unit (104); an obtained word signal receiving/analyzing unit (105) which obtains and analyzes the word; and a language model adjusting unit (110) which identifies the appearance probability of the word based on the time elapsed after obtaining the word by the obtained word signal receiving/analyzing unit (105) and which adjusts the language model by reflecting the identified appearance probability on the language model stored in the language model storage unit (104).
摘要:
A mixed audio separation system (100) which separates a specific audio from among a mixed audio (S100) includes a local frequency information generation unit (105) which obtains pieces of local frequency information (S103) corresponding to local reference waveforms (S102), based on the local reference waveforms (S102) and an analysis waveform which is the waveform of the mixed audio (S100). Each of the local reference waveforms (S102) (i) constitutes a part of a reference waveform for analyzing a predetermined frequency, (ii) has a predetermined temporal/spatial resolution and (iii) includes at least one of an amplification spectrum and a phase spectrum in the predetermined frequency. The system includes: a specific audio's frequency feature value extraction unit (106) which performs pattern matching between a first set which is the pieces of local frequency information and a second set of pieces of frequency information (S103) of a predetermined specific audio, and extracts the first set of the pieces of local frequency information (S103), based on a result of the pattern matching; and an audio signal generation unit which generates a signal of the specific audio, based on the first set of the pieces of local frequency information (S103) extracted by the specific audio's frequency feature value extraction unit.
摘要:
An audio restoration apparatus which restores an audio to be restored having a missing audio part and being included in a mixed audio. The audio restoration apparatus includes: a mixed audio separation unit which extracts the audio to be restored included in the mixed audio; an audio structure analysis unit which generates at least one of a phoneme sequence, a character sequence and a musical note sequence of the missing audio part in the extracted audio to be restored, based on an audio structure knowledge database in which semantics of audio are registered; an unchanged audio characteristic domain analysis unit which segments the extracted audio to be restored into time domains in each of which an audio characteristic remains unchanged; an audio characteristic extraction unit which identifies a time domain where the missing audio part is located, from among the segmented time domains, and extract audio characteristics of the identified time domain in the audio to be restored; and an audio restoration unit which restores the missing audio part in the audio to be restored, using the extracted audio characteristics and the generated one or more of phoneme sequence, character sequence and musical note sequence.
摘要:
A sound identification apparatus which reduces the chance of a drop in the identification rate, including: a frame sound feature extraction unit which extracts a sound feature per frame of an inputted audio signal; a frame likelihood calculation unit which calculates a frame likelihood of the sound feature in each frame, for each of a plurality of sound models; a confidence measure judgment unit which judges a confidence measure based on the frame likelihood; a cumulative likelihood output unit time determination unit which determines a cumulative likelihood output unit time based on the confidence measure; a cumulative likelihood calculation unit which calculates a cumulative likelihood in which the frame likelihoods of the frames included in the cumulative likelihood output unit time are cumulated, for each sound model; a sound type candidate judgment unit which determines, for each cumulative likelihood output unit time, a sound type corresponding to the sound model that has a maximum cumulative likelihood; a sound type frequency calculation unit which calculates the frequency of the sound type candidate; and a sound type interval determination unit which determines the sound type of the inputted audio signal and the interval of the sound type, based on the frequency of the sound type.
摘要:
A target sound analysis apparatus capable of distinguishing between a sound having the same fundamental period as a target sound but which differs therefrom and the target sound and analyzing whether or not the target sound is contained in an evaluation sound is an target sound analysis apparatus that analyzes whether or not a target sound is included in an evaluation sound, and includes: a target sound preparation unit that prepares a target sound that is an analysis waveform to be used for analyzing a fundamental period; an evaluation sound preparation unit that prepares an evaluation sound that is an analyzed waveform in which its fundamental period will be analyzed; and an analysis unit that temporally shifts the target sound with respect to the evaluation sound to sequentially calculate differential values of the evaluation sound and the target sound at corresponding points in time, calculate an iterative interval between the points in time where the differential value is equal to or lower than a predetermined threshold value, and judge whether or not the target sound exists in the evaluation sound based on a period of the iterative interval and the fundamental period of the target sound.
摘要:
A mixed audio separation system (100) which separates a specific audio from among a mixed audio (S100) includes a local frequency information generation unit (105) which obtains pieces of local frequency information (S103) corresponding to local reference waveforms (S102), based on the local reference waveforms (S102) and an analysis waveform which is the waveform of the mixed audio (S100). Each of the local reference waveforms (S102) (i) constitutes a part of a reference waveform for analyzing a predetermined frequency, (ii) has a predetermined temporal/spatial resolution and (iii) includes at least one of an amplification spectrum and a phase spectrum in the predetermined frequency. The system includes: a specific audio's frequency feature value extraction unit (106) which performs pattern matching between a first set which is the pieces of local frequency information and a second set of pieces of frequency information (S103) of a predetermined specific audio, and extracts the first set of the pieces of local frequency information (S103), based on a result of the pattern matching; and an audio signal generation unit which generates a signal of the specific audio, based on the first set of the pieces of local frequency information (S103) extracted by the specific audio's frequency feature value extraction unit.
摘要:
A target sound analysis apparatus capable of distinguishing between a sound having the same fundamental period as a target sound but which differs therefrom and the target sound and analyzing whether or not the target sound is contained in an evaluation sound is an target sound analysis apparatus that analyzes whether or not a target sound is included in an evaluation sound, and includes: a target sound preparation unit that prepares a target sound that is an analysis waveform to be used for analyzing a fundamental period; an evaluation sound preparation unit that prepares an evaluation sound that is an analyzed waveform in which its fundamental period will be analyzed; and an analysis unit that temporally shifts the target sound with respect to the evaluation sound to sequentially calculate differential values of the evaluation sound and the target sound at corresponding points in time, calculate an iterative interval between the points in time where the differential value is equal to or lower than a predetermined threshold value, and judge whether or not the target sound exists in the evaluation sound based on a period of the iterative interval and the fundamental period of the target sound.