摘要:
A method for classifying audio data. For a given piece of audio data a location or position for the given audio data within a mood space is generated and compared to a comparison mood space location. As a result of the comparison, comparison data are generated and provided as a classification result with respect to the given audio data.
摘要:
A method for predicting a misrecognition in a speech recognition system, is based on; the insight that variations in a speech input signal are different depending on the origin of the signal being a speech or a non-speech event. The method comprises steps for receiving a speech input signal, extracting at least one signal variation feature of the speech input signal, and applying a signal variation meter to the speech input signal for deriving a signal variation measure.
摘要:
Based on the insight that variations in a speech input signal are different depending on the origin of the signal being a speech or a non-speech event, the present invention proposes method for predicting a misrecognition in a speech recognition system with steps for receiving a speech input signal, extracting at least one signal variation feature of the speech input signal, and applying a signal variation meter to the speech input signal for deriving a signal variation measure.
摘要:
An apparatus for automatic dissection of segmented audio signals, wherein at least one information signal for identifying programs included in said audio signals and for identifying contents included in said programs. Content detection device detects programs and contents belonging to the respective programs in the information signal. Program weighting device weights each program includes in the information signal based on the contents of the respective program detected by the content detection device. Program ranking device indentifies programmers of the same category and ranking said programs based on a weighting result for each program provided by the program weighting device.
摘要:
The present invention discloses an apparatus for automatic extraction of important events in audio signals comprising: signal input means for supplying audio signals; audio signal fragmenting means for partitioning audio signals supplied by the signal input means into audio fragments of a predetermined length and for allocating a sequence of one or more audio fragments to a respective audio window; feature extracting means for analyzing acoustic characteristics of the audio signals comprised in the audio fragments and for analyzing acoustic characteristics of the audio signals comprised in the audio windows; and important event extraction means for extracting important events in audio signals supplied by the audio signal fragmenting means based on predetermined important event classifying rules depending on acoustic characteristics of the audio signals comprised in the audio fragments and on acoustic characteristics of the audio signals comprised in the audio windows, wherein each important event extracted by the important event extraction means comprises a discrete sequence of cohesive audio fragments corresponding to an important event included in the audio signals.
摘要:
The present invention provides a method, a computer-software-product and an apparatus for enabling a determination of speech related audio data within a record of digital audio data. The method comprises steps for extracting audio features from the record of digital audio data, for classifying one or more subsections of the record of digital audio data, and for marking at least a part of the record of digital audio data classified as speech. The classification of the digital audio data record is performed on the basis of the extracted audio features and with respect to at least one predetermined audio class. The extraction of the at least one audio feature as used by a method according to the invention comprises steps for partitioning the record of digital audio data into adjoining frames, defining a window for each frame which is formed by a sequence of adjoining frames containing the frame under consideration, determining for the frame under consideration and at least one further frame of the window a spectral-emphasis-value which is related to the frequency distribution contained in the digital audio data of the respective frame, and assigning a presence-of-speech indicator value to the frame under consideration based on an evaluation of the differences between the spectral-emphasis-values determined for the frame under consideration and at least one further frame of the window.
摘要:
An audio data segmentation apparatus for segmenting of audio data including for supplying audio data, dividing the audio data supplied into audio clips of a predetermined length, discriminating the audio clips into predetermined audio classes, the audio classes identifying a kind of audio data included in the respective audio clip and segmenting for segmenting the audio data into audio meta patterns based on a sequence of audio classes of consecutive audio clips, each meta pattern being allocated to a predetermined type of contents of the audio data. It is difficult to achieve good results with known methods for segmentation of audio data into meta patterns since the rules for the allocation of the meta patterns are dissatisfying. This problem is solved by the inventive audio data segmentation apparatus further including a program database including program data units to identify a certain kind of program, a plurality of respective audio meta patterns being allocated to each program data unit, wherein the segmenting segments the audio data into corresponding audio meta patterns on the basis of the program data units of the program database 5.
摘要:
A method for classifying audio data. For a given piece of audio data a location or position for the given audio data within a mood space is generated and compared to a comparison mood space location. As a result of the comparison, comparison data are generated and provided as a classification result with respect to the given audio data.
摘要:
To increase the robustness and/or the recognition rate of methods for recognizing speech it is proposed to include phone boundary verification measure features in the process of obtaining and/or generating confidence measures obtained recognition results.