摘要:
A manual annotation system of multi-modal characteristics in multimedia files. There is provided an arrangement for selection an observation modality of video with audio, video without audio, audio with video, or audio without video, to be used to annotate multimedia content. While annotating video or audio features is isolation results in less confidence in the identification of features, observing both audio and video simultaneously and annotating that observation results in a higher confidence level.
摘要:
An arrangement for yielding enhanced audio features towards the provision of enhanced audio-visual features for speech recognition. Input is provided in the form of noisy audio-visual features and noisy audio features related to the noisy audio-visual features.
摘要:
A method and system for sequence independent configuration of adapters installed in a data processing system. Adapters such as disk drive controllers, Token Ring adapters, terminal emulators and the like each include multiple choices associated therewith which specify selected memory allocations which must be utilized in configuring the adapters. A determination is first made of the number of possible combinations of such choices which exist, and if that number is not substantial, an exhaustive evaluation of each possible combination is made to determine if a conflict exists. In the absence of a conflict, each combination is examined for an optimum allocation of memory which maximizes the number of sixteen kilobyte free memory pages remaining within the system memory after configuration for utilization by an expanded memory system. If the number of possible combinations exceeds a predetermined number, only a predetermined number of random combinations are evaluated and an optimum allocation is selected from those random combinations. In order to minimize the probability of chosing combinations with conflicts arising from system utilization of duplicate adapters, a random choice for each successive adapter is selected which, with a high degree of probability, is not identical to a choice selected for a previous adapter.
摘要:
A mechanism is provided for cross-linking information sources using multiple modalities. Text documents, images, audio sources, video, and other media are analyzed to determine media descriptors, which are metadata describing the content of the media sources. The media descriptors from all modalities are collated and cross-linked. A query processing and presentation module, which receives queries and presents results, may also be provided. A query may consist of textual keywords from user input. Alternatively, a query may derive from a media source, such as a text document, image, audio source, or video source.
摘要:
A computer implemented method in a language independent system generates audio-driven facial animation given the speech recognition system for just one language. The method is based on the recognition that once alignment is generated, the mapping and the animation hardly have any language dependency in them. Translingual visual speech synthesis can be achieved if the first step of alignment generation can be made speech independent. Given a speech recognition system for a base language, the method synthesizes video with speech of any novel language as the input.
摘要:
The combination of audio and video speech recognition in a manner to improve the robustness of speech recognition systems in noisy environments. Contemplated are methods and apparatus in which a video signal associated with a video source and an audio signal associated with the video signal are processed, the most likely viseme associated with the audio signal and video signal is determined and, thereafter, the most likely phoneme associated with the audio signal and video signal is determined.
摘要:
Methods and arrangements for annotating digital input. Digital media input is accepted, with the input being arranged in frames, while in annotating at least one of the following are performed: the presentation of frames for annotation in non-linear fashion; and the employment of a cached annotation lexicon for applying labels to frames.
摘要:
Disclosed is a system and method for presenting and browsing information, comprising the steps of classifying the information into a plurality of classes and sub-classes, each class having at least one sub-class; and presenting the plurality of classes of information to a user. The a system and method capable of interactively controlling the presentation of the sub-classes.
摘要:
A trainable radio scanner, including a station monitoring circuit to scan a plurality of radio frequencies and extract audio samples of a predetermined duration from each one of the plurality of radio frequencies having a signal strength above a reception threshold; a memory storing audio classification data and the plurality of audio samples; and an audio analyzer to analyze each one of the plurality of audio samples using the audio classification data and classifies each audio sample into a musical style category; a style discriminator to control a radio station scanning operation of the radio receiver to tune only to preferred radio stations having a radio frequency at which the corresponding audio sample is classified in at least one preferred musical style category.
摘要:
Automated decision making techniques are provided. For example, a technique for generating a decision associated with an individual or an entity includes the following steps. First, two or more data streams associated with the individual or the entity are captured. Then, at least one time-varying measure is computed in accordance with the two or more data streams. Lastly, a decision is computed based on the at least one time-varying measure. One form of the time-varying measure may include a measure of the coverage of a model associated with previously-obtained training data by at least a portion of the captured data. Another form of the time-varying measure may include a measure of the stability of at least a portion of the captured data. While either measure may be employed alone to compute a decision, preferably both the coverage and stability measures are employed. The technique may be used to authenticate a speaker.