摘要:
Provided is a noise-robust voice activity segmentation device which updates parameters used in the determination of voice-active segments without burdening the user, and also provided are a voice activity segmentation method and a voice activity segmentation program.The voice activity segmentation device comprises: a first voice activity segmentation means for determining a voice-active segment (first voice-active segment) and a voice-inactive segment (first voice-inactive segment) in a time-series of input sound by comparing a threshold value and a feature value of the time-series of the input sound; a second voice activity segmentation means for determining, after a reference speech acquired from a reference speech storage means has been superimposed on a time-series of the first voice-inactive segment, a voice-active segment and a voice-inactive segment in the time-series of the superimposed first voice-inactive segment by comparing the threshold value and a feature value of the time-series of the superimposed first voice-inactive segment; and a threshold value update means for updating the threshold value in such a way that a discrepancy rate between the determination result of the second voice activity segmentation means and a correct segmentation calculated from the reference speech is decreased.
摘要:
A plurality of pruning measures (PM) are calculated from a feature amount (CV) of test data (TD) which is input, a plurality of isopycnic surfaces (EC) are plotted and set on a threshold space (SS), a threshold curved surface (SC) in which a decrease in at least one of a plurality of pruning measures (PM) causes an increase in at least one thereof is generated using a portion of one isopycnic surface (EC) as a part, a hypothesis curved surface (HC) of subject data (CD) is generated on the threshold space (SS) to set a position intersecting the threshold curved surface (SC) to a pruning threshold (PS), and a plurality of hypotheses of the subject data (CD) are pruned. Thereby, there is provided a data processing device of which at least one of the recognition speed and the recognition accuracy is higher than in the related art.
摘要:
A speech recognition unit (102) includes a phrase determination unit (103) which determines a phrase boundary based on the comparison between the hypothetical word group generated by speech recognition and set words representing phrase boundaries. In this speech processing device, the speech recognition unit (102) outputs recognition results for each phrase based on a phrase boundary determined by the phrase determination unit (103).
摘要:
A speech recognition unit (102) includes a phrase determination unit (103) which determines a phrase boundary based on the comparison between the hypothetical word group generated by speech recognition and set words representing phrase boundaries. In this speech processing device, the speech recognition unit (102) outputs recognition results for each phrase based on a phrase boundary determined by the phrase determination unit (103).
摘要:
Provided is a text processing system capable of avoiding declining processing efficiency in analyses of text that does not contain breaks.This text processing system comprises: a linking means for generating linking data that links acquired text after the link object analysis result, which are the results of the analysis of text acquired prior to the acquired text; an analysis means for carrying out language analysis on the linked data, using at least a portion of the link object analysis result; and a determination means for determining a prescribed unit break included in the linked data, on the basis of the results of the analysis by the analysis means.The link object analysis results are the results of the analysis after the break that is determined by the determination means.The link object analysis results are the results of the analysis after the break that is determined by the determination means.
摘要:
The present invention provides a speech recognition device includes a threshold value candidate generation unit which extracts a feature indicating likeliness of being speech from a temporal sequence of input sound, and generates a plurality of threshold value candidates for discriminating between speech and non-speech; a speech determination unit which, by comparing the feature indicating likeliness of being speech with the plurality of threshold value candidates, determines respective speech sections, and outputs determination information as a result of the determination; a search unit which corrects each of the speech sections represented by the determination information, using a speech model and a non-speech model; and a parameter update unit which estimates a threshold value for determining a speech section, on the basis of distribution profiles of the feature respectively in utterance sections and in non-utterance sections, within each of the corrected speech sections, and makes an update with the threshold value.
摘要:
Provided is a noise-robust voice activity segmentation device which updates parameters used in the determination of voice-active segments without burdening the user, and also provided are a voice activity segmentation method and a voice activity segmentation program.The voice activity segmentation device comprises: a first voice activity segmentation means for determining a voice-active segment (first voice-active segment) and a voice-inactive segment (first voice-inactive segment) in a time-series of input sound by comparing a threshold value and a feature value of the time-series of the input sound; a second voice activity segmentation means for determining, after a reference speech acquired from a reference speech storage means has been superimposed on a time-series of the first voice-inactive segment, a voice-active segment and a voice-inactive segment in the time-series of the superimposed first voice-inactive segment by comparing the threshold value and a feature value of the time-series of the superimposed first voice-inactive segment; and a threshold value update means for updating the threshold value in such a way that a discrepancy rate between the determination result of the second voice activity segmentation means and a correct segmentation calculated from the reference speech is decreased.
摘要:
A plurality of pruning measures (PM) are calculated from a feature amount (CV) of test data (TD) which is input, a plurality of isopycnic surfaces (EC) are plotted and set on a threshold space (SS), a threshold curved surface (SC) in which a decrease in at least one of a plurality of pruning measures (PM) causes an increase in at least one thereof is generated using a portion of one isopycnic surface (EC) as a part, a hypothesis curved surface (HC) of subject data (CD) is generated on the threshold space (SS) to set a position intersecting the threshold curved surface (SC) to a pruning threshold (PS), and a plurality of hypotheses of the subject data (CD) are pruned. Thereby, there is provided a data processing device of which at least one of the recognition speed and the recognition accuracy is higher than in the related art.
摘要:
Disclosed is an information display system provided with: a signal analyzing unit which analyzes the audio signals obtained from a predetermined location and which generates ambient sound information regarding the sound generated at the predetermined location; and an ambient expression selection unit which selects an ambient expression which expresses the content of what a person is feeling from the sound generated at the predetermined location on the basis of the ambient sound information.
摘要:
An apparatus of this invention is a speech processing apparatus that acquires pseudo speech from a mixture sound including desired speech and noise. The speech processing apparatus includes a first microphone that inputs a first mixture sound including desired speech and noise and outputs a first mixture signal, a second microphone that is opened to the same sound space as that of the first microphone, inputs a second mixture sound including the desired speech and the noise at a ratio different from the first mixture sound, and outputs a second mixture signal, a sound insulator that is disposed between the first microphone and the second microphone, and a noise suppression circuit that suppresses an estimated noise signal based on the first mixture signal and the second mixture signal and outputs a pseudo speech signal. With this arrangement, it is possible to, in a single sound space where desired speech and noise mix, correctly estimate the noise and reconstruct pseudo speech close to the desired speech.