摘要:
In a system implementing image retrieval by performing speech recognition on voice information added to an image, the speech recognition is triggered by an event, such as an image upload event, that is not an explicit speech-recognition order event. The system obtains voice information added to an image, detects an event, and performs speech recognition on the obtained voice information in response to a specific event, even if the detected event is not an explicit speech-recognition order event.
摘要:
In an information processing method for recognizing a handwritten figure or character, with use of a speech input in combination, in order to increase the recognition accuracy a given target is subjected to figure recognition and a first candidate figure list is obtained. Input speech information is phonetically recognized and a second candidate figure list is obtained. On the basis of the figure candidates obtained by the figure recognition and the figure candidates obtained by the speech recognition, a most likely figure is selected.
摘要:
In an information processing method for recognizing a handwritten figure or character, with use of a speech input in combination, in order to increase the recognition accuracy a given target is subjected to figure recognition and a first candidate figure list is obtained. Input speech information is phonetically recognized and a second candidate figure list is obtained. On the basis of the figure candidates obtained by the figure recognition and the figure candidates obtained by the speech recognition, a most likely figure is selected.
摘要:
In a system implementing image retrieval by performing speech recognition on voice information added to an image, the speech recognition is triggered by an event, such as an image upload event, that is not an explicit speech-recognition order event. The system obtains voice information added to an image, detects an event, and performs speech recognition on the obtained voice information in response to a specific event, even if the detected event is not an explicit speech-recognition order event.
摘要:
A speech recognition apparatus includes a word dictionary having recognition target words, a first acoustic model which expresses a reference pattern of a speech unit by one or more states, a second acoustic model which is lower in precision than said first acoustic model, selection means for selecting one of said first acoustic model and said second acoustic model on the basis of a parameter associated with a state of interest, and likelihood calculation means for calculating a likelihood of an acoustic feature parameter with respect to said acoustic model selected by said selection means.
摘要:
A segment set before updating is read, and clustering considering a phoneme environment is performed to it. For each cluster obtained by the clustering, a representative segment of a segment set belonging to the cluster is generated. For each cluster, a segment belonging to the cluster is replaced with the representative segment so as to update the segment set.
摘要:
A speech recognition apparatus includes a word dictionary having recognition target words, a first acoustic model which expresses a reference pattern of a speech unit by one or more states, a second acoustic model which is lower in precision than said first acoustic model, selection means for selecting one of said first acoustic model and said second acoustic model on the basis of a parameter associated with a state of interest, and likelihood calculation means for calculating a likelihood of an acoustic feature parameter with respect to said acoustic model selected by said selection means.
摘要:
In a speech synthesis process, micro-segments are cut from acquired waveform data and a window function. The obtained micro-segments are re-arranged to implement a desired prosody, and superposed data is generated by superposing the re-arranged micro-segments, so as to obtain synthetic speech waveform data. A spectrum correction filter is formed based on the acquired waveform data. At least one of the waveform data, micro-segments, and superposed data is corrected using the spectrum correction filter. In this way, “blur” of a speech spectrum due to the window function applied to obtain micro-segments is reduced, and speech synthesis with high sound quality is realized.
摘要:
Robust signal detection against various types of background noise is implemented. According to a signal detection apparatus and method of this invention, the feature amount of an input signal sequence and the feature amount of a noise component contained in the signal sequence are extracted. After that, the first likelihood indicating probability that the signal sequence is detected and the second likelihood indicating probability that the noise component is detected are calculated on the basis of a predetermined signal-to-noise ratio and the extracted feature amount of the signal sequence. Additionally, a likelihood ratio indicating the ratio between the first likelihood and the second likelihood is calculated. Detection of the signal sequence is determined on the basis of the likelihood ratio.
摘要:
A signal processing apparatus and method for performing a robust endpoint detection of a signal are provided. An input signal sequence is divided into frames each of which has a predetermined time length. The presence of the signal in the frame is detected. After that, the filter process of smoothing the detection result by using the detection result for a past frame is applied to the detection result for a current frame. The filter output is compared with a predetermined threshold value to determine the state of the signal sequence of the current frame on the basis of the comparison result.