摘要:
To classify moving images using audio signals. An audio signal is acquired, a section feature relating to an audio frequency distribution is extracted with respect to each of a plurality of sections each having a predetermined length contained in the acquired audio signal, each extracted section feature is compared with each of reference section features to calculate a section similarity indicating a degree of correlation between each section feature and each reference section feature. An integrated feature relating to the plurality of sections and being calculated based on the section similarity calculated with respect to each of the plurality of sections is extracted from the acquired audio signal. The extracted integrated feature is compared with each of one or more reference integrated features, and the audio signal is classified based on comparison result. Then, classification result is used for moving image classification.
摘要:
To classify moving images using audio signals. An audio signal is acquired, a section feature relating to an audio frequency distribution is extracted with respect to each of a plurality of sections each having a predetermined length contained in the acquired audio signal, each extracted section feature is compared with each of reference section features to calculate a section similarity indicating a degree of correlation between each section feature and each reference section feature. An integrated feature relating to the plurality of sections and being calculated based on the section similarity calculated with respect to each of the plurality of sections is extracted from the acquired audio signal. The extracted integrated feature is compared with each of one or more reference integrated features, and the audio signal is classified based on comparison result. Then, classification result is used for moving image classification.
摘要:
A method and apparatus for speech recognition of the present application has a process to collate, with an input utterance, an acoustic model corresponding to a hypothesis to be expressed by the connection of utterance segments, such as phonemes or syllables, and developed according to a length of an input utterance by an inter-word connection rule thereby obtaining a recognition score. Within a word of the hypothesis, the similar hypotheses high in utterance score within a predetermined threshold from the maximum value of the score are all held to a word end irrespectively of the number of hypotheses. Meanwhile, at a word end of the hypotheses, the hypotheses are narrowed to a predetermined number of upper ranking in the order of higher score.
摘要:
To provide a presentation content generation device that generates various types of presentation contents by dynamically generating a template appropriate for the substance of each content set. The presentation content generation device includes an attribute information extraction unit 2 that extracts attribute information indicating image feature from a content set stored in a local data storage unit 1, a design type determination unit 4 that determines a base land pattern and a color of a template based on the extracted attribute information, a selection index type determination unit 5 that, based on the extracted attribute information, selects one or more contents to be placed on the template and respective placement positions of the selected contents on the template, and a view format conversion unit 6 that places the selected contents on the respective placement positions to generate a presentation content.
摘要:
An interesting section identifying device for identifying an interesting section of a video file based on an audio signal included in the video file, the interesting section being a section in which a user is estimated to express interest, includes an interesting section candidate extracting unit that extracts an interesting section candidate from the video file, the interesting section candidate being a candidate for the interesting section, a detailed structure determining unit that determines whether the interesting section candidate includes a specific detailed structure, and an interesting section identifying unit that identifies the interesting section by analyzing a specific section when the detailed structure determining unit determines that the interesting section candidate includes the detailed structure, the specific section including the detailed structure and being shorter than the interesting section candidate.
摘要:
The present invention provides a device that performs online self-adaption of anchor models for an acoustic space, and a method thereof, the anchor models being used for categorization of an AV stream which is performed based on an audio stream in the AV stream. The device divides an input audio stream into audio segments, each being estimated to have a single acoustic feature, and estimates a single probability model for each audio segment. Then, the device performs clustering on the estimated probability models and probability models stored therein, thereby generating a new anchor model.
摘要:
A string of acoustic feature parameters of each of recognition-desired words and a string of acoustic feature parameters of each of reception words are registered in advance. When an uttered word is received, a string of acoustic feature parameters is extracted from the uttered word, the acoustic feature parameters of the uttered word is compared with the string of acoustic feature parameters of each recognition-desired word, and a recognition-desired word recognition score indicating a similarity degree between the uttered word and each recognition-desired word is calculated. Also, a reception word recognition score indicating a similarity degree between the uttered word and each reception word is calculated. In cases where a particular recognition-desired word recognition score corresponding to a particular recognition-desired word is higher than the highest reception word recognition score, the utter word is recognized as the particular recognition-desired word, and an operation of an electric apparatus is controlled according to the particular recognition-desired word. In contrast, in cases where a particular reception word recognition score corresponding to a particular reception word is higher than the highest recognition-desired word recognition score, the utter word is recognized as the particular reception word and is rejected, so that the electric apparatus is not operated.
摘要:
The image classification apparatus extracts first features of each received image (S22) and second features of a relevant image relevant to each received image (S25). Subsequently, the image classification apparatus obtains a third feature by calculation using locality of the extracted first and second features, the third feature being distinctive of a target object of each received image (S26), and creates model data based on the obtained third feature (S27).
摘要:
An audio processing device including a feature calculation unit, a boundary calculation unit and a judgment unit, detects points of change of audio features from an audio signal in an AV content. The feature calculation unit calculates, for each unit section of the audio signal, section feature data expressing features of the audio signal in the unit section. The boundary calculation unit calculates, for each target unit section among the unit sections of the audio signal, a piece of boundary information relating to at least one boundary of a similarity section. The similarity section consists of consecutive unit sections, inclusive of the target unit section, which each have similar section feature data. The judgment unit calculates a priority of each boundary indicated by one or more of the pieces of boundary information and judges whether the boundary is a scene change point based on the priority.
摘要:
A call other than a conversion partner call and various sounds are detected by input audio signals from plural microphones without deteriorating a voice recognition precision. A hearing aid apparatus according to the present invention corrects a frequency characteristic of the call voice other than the conversation partner voice based on an arrival direction of the call voice other than the conversation partner voice, which is estimated based on the audio signal converted by the plural microphones, checks a call word standard pattern representing features of a phoneme and a syllabic sound based on other voice data picked up by using the microphones having one characteristic against a call voice other than the conversation partner voice in which the frequency characteristic is corrected by the frequency characteristic correction processing unit to determine whether the call voice is a call word, and forms a directivity in the direction other than the arrival direction of the voice of the conversation partner. Then, the hearing aid apparatus according to the present invention corrects the frequency characteristic of the call voice other than the conversation partner voice so as to provide the same characteristic as that of the microphones at the time of creating the audio standard pattern.