摘要:
An utterance state detection device includes an user voice stream data input unit that gets user voice stream data of an user, a frequency element extraction unit that extracts high frequency elements by frequency-analyzing the user voice stream data, a fluctuation degree calculation unit that calculates a fluctuation degree of the high frequency elements thus extracted every unit time, a statistic calculation unit that calculates a statistic every certain interval based on a plurality of the fluctuation degrees in a certain period of time, and an utterance state detection unit that detects an utterance state of a specified user based on the statistic obtained from user voice stream data of the specified user.
摘要:
An utterance state detection device includes an user voice stream data input unit that gets user voice stream data of an user, a frequency element extraction unit that extracts high frequency elements by frequency-analyzing the user voice stream data, a fluctuation degree calculation unit that calculates a fluctuation degree of the high frequency elements thus extracted every unit time, a statistic calculation unit that calculates a statistic every certain interval based on a plurality of the fluctuation degrees in a certain period of time, and an utterance state detection unit that detects an utterance state of a specified user based on the statistic obtained from user voice stream data of the specified user.
摘要:
A speech recognition device includes, a speech recognition section that conducts a search, by speech recognition, on audio data stored in a first memory section to extract word-spoken portions where plural words transferred are each spoken and, of the word-spoken portions extracted, rejects the word-spoken portion for the word designated as a rejecting object; an acquisition section that obtains a derived word of a designated search target word, the derived word being generated in accordance with a derived word generation rule stored in a second memory section or read out from the second memory section; a transfer section that transfers the derived word and the search target word to the speech recognition section, the derived word being set to the outputting object or the rejecting object by the acquisition section; and an output section that outputs the word-spoken portion extracted and not rejected in the search.
摘要:
A spoken term detection apparatus includes: processing performed by a processor includes a feature extraction process extracting an acoustic feature from speech data accumulated in an accumulation part and storing an extracted acoustic feature in an acoustic feature storage, a first calculation process calculating a standard score from a similarity between an acoustic feature stored in the acoustic feature storage and an acoustic model stored in the acoustic model storage part, a second calculation process for comparing an acoustic model corresponding to an input keyword with the acoustic feature stored in the acoustic feature storage part to calculate a score of the keyword, and a retrieval process retrieving speech data including the keyword from speech data accumulated in the accumulation part based on the score of the keyword calculated by the second calculation process and the standard score stored in the standard score storage part.
摘要:
A speech recognition system includes the following: a feature calculating unit; a sound level calculating unit that calculates an input sound level in each frame; a decoding unit that matches the feature of each frame with an acoustic model and a linguistic model, and outputs a recognized word sequence; a start-point detector that determines a start frame of a speech section based on a reference value; an end-point detector that determines an end frame of the speech section based on a reference value; and a reference value updating unit that updates the reference value in accordance with variations in the input sound level. The start-point detector updates the start frame every time the reference value is updated. The decoding unit starts matching before being notified of the end frame and corrects the matching results every time it is notified of the start frame. The speech recognition system can suppress a delay in response time while performing speech recognition based on a proper speech section.
摘要:
A speech recognition system includes the following: a feature calculating unit; a sound level calculating unit that calculates an input sound level in each frame; a decoding unit that matches the feature of each frame with an acoustic model and a linguistic model, and outputs a recognized word sequence; a start-point detector that determines a start frame of a speech section based on a reference value; an end-point detector that determines an end frame of the speech section based on a reference value; and a reference value updating unit that updates the reference value in accordance with variations in the input sound level. The start-point detector updates the start frame every time the reference value is updated. The decoding unit starts matching before being notified of the end frame and corrects the matching results every time it is notified of the start frame. The speech recognition system can suppress a delay in response time while performing speech recognition based on a proper speech section.
摘要:
A spoken term detection apparatus includes: processing performed by a processor includes a feature extraction process extracting an acoustic feature from speech data accumulated in an accumulation part and storing an extracted acoustic feature in an acoustic feature storage, a first calculation process calculating a standard score from a similarity between an acoustic feature stored in the acoustic feature storage and an acoustic model stored in the acoustic model storage part, a second calculation process for comparing an acoustic model corresponding to an input keyword with the acoustic feature stored in the acoustic feature storage part to calculate a score of the keyword, and a retrieval process retrieving speech data including the keyword from speech data accumulated in the accumulation part based on the score of the keyword calculated by the second calculation process and the standard score stored in the standard score storage part.
摘要:
A voice recognition system and a voice processing system in which a self-repair utterance can be inputted and recognized accurately, as in a conversation in which a human user makes a self-repair utterance. A signal processing unit converts speech voice data into a feature, a voice section detecting unit detects voice sections in the speech voice data, and a priority determining unit selects a voice section that includes a self-repair utterance from among the voice sections according to a priority criterion without using any result of recognizing a speech vocabulary sequence. Priority criteria can include a length of the voice section, signal to noise ratio, chronological order of the voice section as well as speech speed. A decoder calculates a matching score with a recognition vocabulary using the feature of the voice section and an acoustic model.
摘要:
A voice recognition system and a voice processing system in which a self-repair utterance can be inputted and recognized accurately as in a conversation between humans in the case where a user makes the self-repair utterance are provided. An signal processing unit for converting speech voice data into a feature, a voice section detecting unit for detecting voice sections in the speech voice data, a priority determining unit for selecting a voice section to be given priority from among the voice sections detected by the voice section detecting unit according to a predetermined priority criterion, and a decoder for calculating a degree of matching with a recognition vocabulary using the feature of the voice section selected by the priority determining unit and an acoustic model are included. The priority determining unit uses as the predetermined priority criterion at least one selected from the group consisting of (1) a length of the voice section, (2) a power or an S/N ratio of the voice section, and (3) a chronological order of the voice section.
摘要:
An apparatus for continuously reproducing plural sound data has a start end/terminal end determination unit for determining the start end/terminal end of the continued respective sound data, a fade-in/fade-out unit for carrying out fade-in process at the start end of plural respective sound data and/or fade-out process at the terminal end of the same, a data output unit for continuously outputting the plural sound data which have been subjected to fade-in process and/or fade-out process, and a reproduction unit for reproducing the outputted plural sound data. In reproducing continuously the plural sound data, no noise is generated at the joint portion of the adjacent sound data.