摘要:
A system enables a user to query for key words and phrases a text document, such as a presentation slide file, and an associated audio stream, such as can be derived from an audio-video recording that is made of a presenter contemporaneously with the showing of the slides to an audience. A graphical user interface is presented in which query results for both the text document and the audio stream are displayed in a time-aligned format, to enable a user to easily and conveniently browse the text document and accompanying time-aligned audio stream based on the key words/phrases.
摘要:
A system and method for indexing an audio stream for subsequent information retrieval and for skimming, gisting, and summarizing the audio stream includes using special audio prefiltering such that only relevant speech segments that are generated by a speech recognition engine are indexed. Specific indexing features are disclosed that improve the precision and recall of an information retrieval system used after indexing for word spotting. The invention includes rendering the audio stream into intervals, with each interval including one or more segments. For each segment of an interval it is determined whether the segment exhibits one or more predetermined audio features such as a particular range of zero crossing rates, a particular range of energy, and a particular range of spectral energy concentration. The audio features are heuristically determined to represent respective audio events including silence, music, speech, and speech on music. Also, it is determined whether a group of intervals matches a heuristically predefined meta pattern such as continuous uninterrupted speech, concluding ideas, hesitations and emphasis in speech, and so on, and the audio stream is then indexed based on the interval classification and meta pattern matching, with only relevant features being indexed to improve subsequent precision of information retrieval. Also, alternatives for longer terms generated by the speech recognition engine are indexed along with respective weights, to improve subsequent recall.
摘要:
Browsing of digital video data is performed using a fast forward or fast reverse play mode. The digital video is analyzed and processed to produce a content-based variable-rate video playback sequence for fast browsing. To create the playback sequence, each shot in a video is sped-up at a relatively slow rate at the beginning of the shot by selecting many frames and then the speedup rate is increased as the shot progresses by selecting progressively fewer frames. This method and apparatus of variable-rate frame selection can be used to add and index to a video, play an original video in fast forward/backward mode or to create a new video—a fast forward playback video summary.
摘要:
Preferred implementations of the invention permit a user to seamlessly switch from a first media stream to a second media stream in a synchronized way, such that the second media stream picks up where the first media stream left off. In this way, the user experiences events chronologically but without interruption. In a preferred implementation, a user watching a skim video switches to a full length video when, for example, the skim video reaches a frame that is of particular interest to the user. The full length video begins at a point corresponding to the frame in the skim video that is of interest to the user, without skipping over video segments, so that the user does not experience any time gaps in the story line.
摘要:
A system and associated method automatically discover salient segments in a speech transcript and focus on the segmentation of an audio/video source into topically cohesive segments based on Automatic Speech Recognition (ASR) transcriptions. The word n-grams are extracted from the speech transcript using a three-phase segmentation algorithm based on the following sequence or combination of boundary-based and content-based methods: a boundary-based method; a rate of arrival of feature method; and a content-based method. In the first two segmentation passes, the temporal proximity and the rate of arrival of features are analyzed to compute an initial segmentation. In the third segmentation pass, changes in the set of content-bearing words used by adjacent segments are detected, to validate the initial segments for merging them, to prevent over-segmentation.
摘要:
A system and method for visualizing and navigating dynamic documents including data from an ongoing process and including instances of specified search terms. A summary view including a condensed abstract representation of a dynamic document provides a global overview of the distribution of search terms. The invention updates the document and aggregates the instances of search terms when the representation includes a nonlinear scale or uses multiple display regions having different resolution levels. The invention supports rapid skimming of dynamic documents and dynamic document collections, including enhancements triggered by cursor brushing, while keeping the user in context. Navigation to a segment of the dynamic document by selecting a corresponding portion of the summary view can replace the use of conventional scrolling techniques.
摘要:
A system and method for visualizing and navigating document content using a condensed representation of a document to provide both a global overview of the distribution of key search terms as well as their immediate context. The invention supports rapid skimming of documents and document collections and enables efficient information finding, in some cases entirely eliminating the need to scroll within a document as with a conventional browser tool. The invention is of particular utility with personal digital assistants, which generally have small displays and limited storage capacity and communication bandwidth in comparison to personal computers. Documents may include conventional text and image files, web pages, audio files, and video files. The invention may also apply to collections of documents.
摘要:
A system and method for automatically generating a hierarchical table of contents or outline for indexing a document and identifying clusters of related information in the document. The document may comprise text, audio, video, or a multimedia presentation. The invention employs a unique and novel combination of latent semantic indexing techniques to identify related blocks and major topic changes within the document with scale space segmentation techniques to respectively identify self-similar blocks within the document and to thus find topic changes of various sizes at block edges. The invention then produces a visual presentation of the semantic structure of the document.
摘要:
A signal processing system determines the characteristic of a signal for encoding or decoding by examining and classifying such signal, and then applies a transformation or inverse transformation to such signal. Depending on classification of the signal, various transforms or inverse transforms are applicable adaptively thereto.