摘要:
The present invention provides a device that performs online self-adaption of anchor models for an acoustic space, and a method thereof, the anchor models being used for categorization of an AV stream which is performed based on an audio stream in the AV stream. The device divides an input audio stream into audio segments, each being estimated to have a single acoustic feature, and estimates a single probability model for each audio segment. Then, the device performs clustering on the estimated probability models and probability models stored therein, thereby generating a new anchor model.
摘要:
A string of acoustic feature parameters of each of recognition-desired words and a string of acoustic feature parameters of each of reception words are registered in advance. When an uttered word is received, a string of acoustic feature parameters is extracted from the uttered word, the acoustic feature parameters of the uttered word is compared with the string of acoustic feature parameters of each recognition-desired word, and a recognition-desired word recognition score indicating a similarity degree between the uttered word and each recognition-desired word is calculated. Also, a reception word recognition score indicating a similarity degree between the uttered word and each reception word is calculated. In cases where a particular recognition-desired word recognition score corresponding to a particular recognition-desired word is higher than the highest reception word recognition score, the utter word is recognized as the particular recognition-desired word, and an operation of an electric apparatus is controlled according to the particular recognition-desired word. In contrast, in cases where a particular reception word recognition score corresponding to a particular reception word is higher than the highest recognition-desired word recognition score, the utter word is recognized as the particular reception word and is rejected, so that the electric apparatus is not operated.
摘要:
To classify moving images using audio signals. An audio signal is acquired, a section feature relating to an audio frequency distribution is extracted with respect to each of a plurality of sections each having a predetermined length contained in the acquired audio signal, each extracted section feature is compared with each of reference section features to calculate a section similarity indicating a degree of correlation between each section feature and each reference section feature. An integrated feature relating to the plurality of sections and being calculated based on the section similarity calculated with respect to each of the plurality of sections is extracted from the acquired audio signal. The extracted integrated feature is compared with each of one or more reference integrated features, and the audio signal is classified based on comparison result. Then, classification result is used for moving image classification.
摘要:
A call other than a conversion partner call and various sounds are detected by input audio signals from plural microphones without deteriorating a voice recognition precision. A hearing aid apparatus according to the present invention corrects a frequency characteristic of the call voice other than the conversation partner voice based on an arrival direction of the call voice other than the conversation partner voice, which is estimated based on the audio signal converted by the plural microphones, checks a call word standard pattern representing features of a phoneme and a syllabic sound based on other voice data picked up by using the microphones having one characteristic against a call voice other than the conversation partner voice in which the frequency characteristic is corrected by the frequency characteristic correction processing unit to determine whether the call voice is a call word, and forms a directivity in the direction other than the arrival direction of the voice of the conversation partner. Then, the hearing aid apparatus according to the present invention corrects the frequency characteristic of the call voice other than the conversation partner voice so as to provide the same characteristic as that of the microphones at the time of creating the audio standard pattern.
摘要:
An azimuth and distance calculator calculates the relative direction and distance to the next intersection to be guided, based on information on the intersection supplied from storage for received information on an object to be guided and information on the moving histories of a user. Then, the calculator converts the relative direction into a horizontal angle and the distance to an elevation angle, and passes the angles to a stereophony generator. The stereophony generator creates output sound information having a sound image localized outside of a headphone and outputs the information to the headphone. In this manner, the user can accurately understand the distance to the object.
摘要:
The present invention aims at extracting a keyword of conversation without preparations by advanced anticipation of keywords of conversation. A keyword extracting device of the present invention includes an audio input section 101 by way of which a speech sound made by a speaker is input; a speech segment determination section 102 that determines a speech segment for each speaker in connection with the input speech sound; a speech recognition section 103 that recognizes a speech sound of the determined speech segment for each speaker; an interrupt detection section 104 that detects a feature of a speech response suggesting presence of a keyword on the basis of a response of another speaker to speech sounds of respective speakers; namely, an interrupt where a preceding speech and a subsequent speech overlap; a keyword extraction section 105 that extracts the keyword from the speech in the speech segment specified on the basis of an interrupt; a keyword search section 106 that performs keyword search by means of the keyword; and a display section 107 that displays a result of keyword search.
摘要:
Provided is a lifestyle collecting apparatus that collects information for determining a lifestyle of a user, and includes: an object information detecting unit configured to detect object information representing an object around the user; a relevance degree calculating unit configured to calculate a relevance degree of the user to the object, using the object information; an appearance information extracting unit configured to extract appearance information from the object information, and add the relevance degree to the extracted appearance information, the appearance information representing an appearance of the object; and a lifestyle database which stores the appearance information to which the relevance degree has been added, as the information for determining the lifestyle of the user.
摘要:
A voice output apparatus, enhancing a robustness of an interface between a user and the apparatus by transmitting, information to the user via text message and voice message. The voice output apparatus including a display unit (107) displaying a text message that is apparatus-transmitting information to be transmitted to the user, a delay unit (105), and a voice output unit (106) estimating a delay time necessary for an action taken by the user to visually identify the text message displayed by the display unit (107), and outputting, via voice message, the apparatus-transmitting information, when the delay time (T) passes after the text message is displayed.
摘要:
The present invention aims at extracting a keyword of conversation without preparations by advanced anticipation of keywords of conversation. A keyword extracting device of the present invention includes an audio input section 101 by way of which a speech sound made by a speaker is input; a speech segment determination section 102 that determines a speech segment for each speaker in connection with the input speech sound; a speech recognition section 103 that recognizes a speech sound of the determined speech segment for each speaker; an interrupt detection section 104 that detects a feature of a speech response suggesting presence of a keyword on the basis of a response of another speaker to speech sounds of respective speakers; namely, an interrupt where a preceding speech and a subsequent speech overlap; a keyword extraction section 105 that extracts the keyword from the speech in the speech segment specified on the basis of an interrupt; a keyword search section 106 that performs keyword search by means of the keyword; and a display section 107 that displays a result of keyword search.
摘要:
The voice output apparatus, which enhances a robustness of an interface between a user and the apparatus by transmitting, information to the user via text message and voice message, is comprised of: a display unit (107) for displaying a text message that is apparatus-transmitting information to be transmitted to the user; and a delay unit (105) as well as a voice output unit (106) for estimating a delay time necessary for an action taken by the user to visually identify the text message displayed by the display unit (107), and outputting, via voice message, the apparatus-transmitting information, when the delay time (T) passes after the text message is displayed.