Abstract:
An apparatus for continuously reproducing plural sound data has a start end/terminal end determination unit for determining the start end/terminal end of the continued respective sound data, a fade-in/fade-out unit for carrying out fade-in process at the start end of plural respective sound data and/or fade-out process at the terminal end of the same, a data output unit for continuously outputting the plural sound data which have been subjected to fade-in process and/or fade-out process, and a reproduction unit for reproducing the outputted plural sound data. In reproducing continuously the plural sound data, no noise is generated at the joint portion of the adjacent sound data.
Abstract:
A speech recognition apparatus and a method are provided in which even when a speech recognition grammar is employed independently or alternatively, speech recognition response is improved. Voice data is received. Then, an output suspended state for a speech recognition result is maintained until the duration of a silence interval that follows the utterance part reaches a criterion time. Information indicating whether a last word of a word sequence is a final word is stored. On the basis of the language model, a recognition candidate word sequence is extracted. When the last word of the extracted word sequence is determined as being a final word, the speech recognition result is output in a time shorter than the criterion time, while when determined as not being a final word, the speech recognition result is output at the time that the criterion time has elapsed.
Abstract:
A non-speech section detecting device generating a plurality of frames having a given time length on the basis of sound data obtained by sampling sound, and detecting a non-speech section having a frame not containing voice data based on speech uttered by a person, the device including: a calculating part calculating a bias of a spectrum obtained by converting sound data of each frame into components on a frequency axis; a judging part judging whether the bias is greater than or equal to a given threshold or alternatively smaller than or equal to a given threshold; a counting part counting the number of consecutive frames judged as having a bias greater than or equal to the threshold or alternatively smaller than or equal to the threshold; a count judging part judging whether the obtained number of consecutive frames is greater than or equal to a given value.
Abstract:
An information processing apparatus for speech recognition includes a first speech dataset storing speech data uttered by low recognition rate speakers; a second speech dataset storing speech data uttered by a plurality of speakers; a third speech dataset storing speech data to be mixed with the speech data of the second speech dataset; a similarity calculating part obtaining, for each piece of the speech data in the second speech dataset, a degree of similarity to a given average voice in the first speech dataset; a speech data selecting part recording the speech data, the degree of similarity of which is within a given selection range, as selected speech data in the third speech dataset; and an acoustic model generating part generating a first acoustic model using the speech data recorded in the second speech dataset and the third speech dataset.
Abstract:
A grammar update method for storing grammar data for speech interaction used for recognizing speech data and newly recognizing the speech data without using the grammar data, includes determining whether or not a new-recognition result in the newly-recognizing operation can be accepted, and in the case where the new-recognition result cannot be accepted, specifying a portion to be added and updated from the stored grammar data, thereby adding and updating the grammar data.
Abstract:
There is provided an association apparatus for associating a plurality of voice data converted from voices produced by speakers, comprising: a word/phrase similarity deriving section which derives an appearance ratio of a common word/phrase that is common among the voice data based on a result of speech recognition processing on the voice data, as a word/phrase similarity; a speaker similarity deriving section which derives a result of comparing characteristics of voices extracted from the voice data, as a speaker similarity; an association degree deriving section which derives a possibility of the plurality of the voice data, which are associated with one another, based on the derived word/phrase similarity and the speaker similarity, as an association degree; and an association section which associates the plurality of the voice data with one another, the derived association degree of which is equal to or more than a preset threshold.
Abstract:
An utterance state detection device includes an user voice stream data input unit that gets user voice stream data of an user, a frequency element extraction unit that extracts high frequency elements by frequency-analyzing the user voice stream data, a fluctuation degree calculation unit that calculates a fluctuation degree of the high frequency elements thus extracted every unit time, a statistic calculation unit that calculates a statistic every certain interval based on a plurality of the fluctuation degrees in a certain period of time, and an utterance state detection unit that detects an utterance state of a specified user based on the statistic obtained from user voice stream data of the specified user.
Abstract:
A speech recognition device includes, a speech recognition section that conducts a search, by speech recognition, on audio data stored in a first memory section to extract word-spoken portions where plural words transferred are each spoken and, of the word-spoken portions extracted, rejects the word-spoken portion for the word designated as a rejecting object; an acquisition section that obtains a derived word of a designated search target word, the derived word being generated in accordance with a derived word generation rule stored in a second memory section or read out from the second memory section; a transfer section that transfers the derived word and the search target word to the speech recognition section, the derived word being set to the outputting object or the rejecting object by the acquisition section; and an output section that outputs the word-spoken portion extracted and not rejected in the search.
Abstract:
An information processing apparatus for speech recognition includes a first speech dataset storing speech data uttered by low recognition rate speakers; a second speech dataset storing speech data uttered by a plurality of speakers; a third speech dataset storing speech data to be mixed with the speech data of the second speech dataset; a similarity calculating part obtaining, for each piece of the speech data in the second speech dataset, a degree of similarity to a given average voice in the first speech dataset; a speech data selecting part recording the speech data, the degree of similarity of which is within a given selection range, as selected speech data in the third speech dataset; and an acoustic model generating part generating a first acoustic model using the speech data recorded in the second speech dataset and the third speech dataset.
Abstract:
A searching device includes a history storing unit storing a search target obtained by a search and a search date in a storage unit; a relevancy storing unit storing in the storage unit a previous searching keyword including a plurality of date-related words as well as the search target and an attribute of the search target in association with one another; a change unit changing the previous searching keyword, based on the search date stored in the storage unit and a date output from a clock unit; a reception unit receiving a previous searching keyword and the search target or attribute that are entered by voice; and an extraction unit extracting a search target corresponding to the previous searching keyword and the search target or attribute received by the reception unit, by referring to the previous searching keyword that is obtained after changing, the search target and the attribute.