摘要:
In an embodiment of the invention, an electronic document (e-document) can be searched and found by capturing an image of the printed document. Instead of typing in a file name or searching through multiple directories, the user simply takes a picture of the document with a camera and the system uses the document image to locate the e-document. In an alternative embodiment of the invention, an image of a printed document can be useful for remote document sharing. In various embodiments of the invention, sharing an image of a printed document can be used to email a high quality paper document, send a high quality fax, or open a document to a page containing an annotation. Through co-design of the feature extraction and search algorithm in the system, the image feature detection robustness and search speed are improved at the same time.
摘要:
Recorded video is accessed from printed notes or summaries derived from the video. Summaries may be created automatically by analyzing the recorded video, and annotations are made by a user on a device for note-taking with digital ink and video. The notes and/or summaries are printed along with data glyphs that provide time based indexes or offsets into the recorded video. The indexes or offsets are retrieved by scanning the glyph on the printout. The glyph information can be embedded in the printouts in many ways. One method is to associate block glyphs with annotations or images on the printed pages. Another method is to provide an address carpet in an annotated timeline. Yet another method is to provide a two-dimensional address carpet with X-Y position mapped to time which can be used to provide selected access to the video. The accessed video may be played back on the note-taking device on a pen computer, or on a summary interface on a Web browser-type device.
摘要:
A system that represents a video sequence comprising plurality of video clips as a number of images. The plurality of video clips is represented as a reduced representation of video images. A video clip is represented as a keyframe, wherein multiple keyframes may then be arranged according to chronological order. All or only a representative portion of the video clips can be represented as keyframes. The size of the keyframe may be configured to represent the length or importance of the video clip. The keyframe may depict an entire frame of a video clip, or a region of meaningful information within a frame of a video clip. Multiple keyframes may be arranged in a two dimensional array, in an S-shaped curve, or some other pattern. The keyframes may depict motion of an object occurring over time in the video clip by configuring groups of pixels in the key frame. Configuring groups of pixels may include colorizing pixel groups and depicting pixel groups at a semi-transparent level according to the number of frames between the keyframe and the frame containing the object in motion.
摘要:
A method and apparatus for providing multi-resolution video to multiple users under hybrid human and automatic control. Initial environment and close-up images are captured using a first camera and a PTZ camera. The initial images are then stored in memory. Current environment and close-up images are captured and the an estimated difference between the initial and current images and the true image is determined. The estimated differences are weighted and compared and the stored images are updated. A close-up image is then provided to each user of the system. The close-up camera is then directed to a portion of the environment image having high distortion, and current environment and close-up images are captured again.
摘要:
When dynamically grouping a plurality of graphic objects, such as displayed on a graphic input display apparatus, a cluster tree is formed for the plurality of graphic objects. The cluster tree is based on a plurality of different types of distance measures. These include a time distance and a spatial distance. These distances are combined to form a distance metric indicting a distance between a pair of the graphic objects. Each level of the cluster tree defines a new cluster of the graphic objects. At least one of the graphic objects is selected. The different cluster levels of the cluster tree containing the selected graphic object are displayable. The displayed cluster of the graphic objects can be modified to increase or decrease the cluster level of the cluster containing the selected graphic object.
摘要:
In one aspect, the present invention is directed to a method and an apparatus for organizing digital media, particularly digital photos, using face recognition. According to a first aspect of the present invention, a computer-based method for organizing digital photos comprises: extracting objects of interest from a plurality of photographs; cropping said plurality of photographs to generate images of isolated objects of interest; applying a recognition algorithm to determine the similarity of isolated objects of interest with a reference; displaying a plurality of objects arranged as a function of the determined similarity; and receiving user input to associate said objects with a particular classification.
摘要:
The invention provides for quickly browsing through a large set of video clips to locate video clips of interest. In an embodiment of the present invention, hierarchical clustering of the video clips can be undertaken enabling the user to successively identify the subgroup of video clips of interest. This approach generates a video summary for the contents of each cluster by selecting representative video clips from individual videos and lower level clusters within the cluster. Links are added between the more general, higher-level clusters and the elements they contain. Thus, starting at the top of the set of videos being browsed or returned by the search engine and continuing at each subsequent cluster level, the user is presented with video summaries for the relevant parts of videos and those of next lower-level clusters. The user can then follow the navigational link to the desired video or lower-level cluster.
摘要:
Embodiments of the present invention provide the ability to navigate, view, and manipulate a collection of digital images utilizing a GUI that has the familiar context of a calendar. Graphical objects representative of digital images are displayed within a particular day displayed in a calendar-based GUI. A user may group digital images into groups, modify the date with which a digital image is associated and perform various other manipulations using embodiments of a calendar-based GUI.
摘要:
Detection of video shot boundaries using a Video Segmenting Hidden Markov Model to model the sequence of states of a video. The Video Segmenting Hidden Markov Model determines the state sequence based on feature values. Using Hidden Markov Model techniques allows for automatic learning and use of multiple features including motion vectors, audio differences and histogram differences, without the need for manual adjustments of these thresholds.
摘要:
A method for segmenting audio data, comprising speech from a plurality of individual speakers, according to speaker is provided. The method comprises providing individual HMMs for each individual speaker, each individual HMM including at least one state, and constructing a speaker network HMM by connecting the individual HMMs in parallel. The audio data is then divided into segments by determining a most likely sequence of states through the speaker network HMM, each of the segments being associated with one of the individual HMMs. Afterward, the speaker of each of the segments is identified. The segmented data may be used to form an index into the audio data according to speaker.