摘要:
Systems and methods for detecting people or speakers in an automated fashion are disclosed. A pool of features including more than one type of input (like audio input and video input) may be identified and used with a learning algorithm to generate a classifier that identifies people or speakers. The resulting classifier may be evaluated to detect people or speakers.
摘要:
The present invention leverages spatial relationships to provide a systematic means to recognize text and/or graphics. This allows augmentation of a sketched shape with its symbolic meaning, enabling numerous features including smart editing, beautification, and interactive simulation of visual languages. The spatial recognition method obtains a search-based optimization over a large space of possible groupings from simultaneously grouped and recognized sketched shapes. The optimization utilizes a classifier that assigns a class label to a collection of strokes. The overall grouping optimization assumes the properties of the classifier so that if the classifier is scale and rotation invariant the optimization will be as well. Instances of the present invention employ a variant of AdaBoost to facilitate in recognizing/classifying symbols. Instances of the present invention employ dynamic programming and/or A-star search to perform optimization. The present invention applies to both hand-sketched shapes and printed handwritten text, and even heterogeneous mixtures of the two.
摘要:
Image recognition is utilized to facilitate in scoring parse trees for two-dimensional recognition tasks. Trees and subtrees are rendered as images and then utilized to determine parsing scores. Other instances of the subject invention can incorporate additional features such as stroke curvature and/or nearby white space as rendered images as well. Geometric constraints can also be employed to increase performance of a parsing process, substantially improving parsing speed, some even resolvable in polynomial time. Additional performance enhancements can be achieved in yet other instances of the subject invention by employing constellations of integral images and/or integral images of document features.
摘要:
A system for organizing images includes an extraction component that extracts visual information (e.g., faces, scenes, etc.) from the images. The extracted visual information is provided to a comparison component which computes similarity confidence data between the extracted visual information. The similarity confidence data is an indication of the likelihood that items of extracted visual information are similar. The comparison component then generates a visual distribution of the extracted visual information based upon the similarity confidence data. The visual distribution can include groupings of the extracted visual information based on computed similarity confidence data. For example, the visual distribution can be a two-dimensional layout of faces organized based on the computed similarity confidence data—with faces in closer proximity faces computed to have a greater probability of representing the same person. The visual distribution can then be utilized by a user to sort, organize and/or tag images.
摘要:
A computer-implemented method and apparatus are provided for populating an electronic form from an electronic image. The method and apparatus identify a size, orientation and position of an object within the electronic image, and identify information elements from pixels within the image that correspond to the object. Fields of the electronic form are displayed to a user along with the identified information elements through a graphical user interface. The information elements are parsed into tagged groups of different information types. At least some of the fields of the electronic form are populated with the tagged groups to produce a populated form. The user is allowed to edit the populated fields through the graphical user interface.
摘要:
An audio element cache is provided that is capable of caching audio elements for each user in a personal radio server system. In operation, customized radio content is provided to remote listeners in a personal radio server system by: storing a plurality of audio elements in a file server; retrieving a subset of the plurality of audio elements from the file server by predicting the content desired by a remote listener based on a user profile of the remote listener; storing the subset of the plurality of audio elements in an audio element cache; selecting audio elements to provide to a remote listener from the audio element cache; and transmitting the audio elements to the remote listener. In an embodiment, the plurality of audio elements are stored in the audio element cache when a remote listener logs-on the personal radio server system.
摘要:
Systems and methods for detecting people or speakers in an automated fashion are disclosed. A pool of features including more than one type of input (like audio input and video input) may be identified and used with a learning algorithm to generate a classifier that identifies people or speakers. The resulting classifier may be evaluated to detect people or speakers.
摘要:
A two-dimensional representation of a document is leveraged to extract a hierarchical structure that facilitates recognition of the document. The visual structure is grammatically parsed utilizing two-dimensional adaptations of statistical parsing algorithms. This allows recognition of layout structures (e.g., columns, authors, titles, footnotes, etc.) and the like such that structural components of the document can be accurately interpreted. Additional techniques can also be employed to facilitate document layout recognition. For example, grammatical parsing techniques that utilize machine learning, parse scoring based on image representations, boosting techniques, and/or “fast features” and the like can be employed to facilitate in document recognition.
摘要:
Systems and methods are described for face recognition using discriminatively trained orthogonal rank one tensor projections. In an exemplary system, images are treated as tensors, rather than as conventional vectors of pixels. During runtime, the system designs visual features—embodied as tensor projections—that minimize intraclass differences between instances of the same face while maximizing interclass differences between the face and faces of different people. Tensor projections are pursued sequentially over a training set of images and take the form of a rank one tensor, i.e., the outer product of a set of vectors. An exemplary technique ensures that the tensor projections are orthogonal to one another, thereby increasing ability to generalize and discriminate image features over conventional techniques. Orthogonality among tensor projections is maintained by iteratively solving an ortho-constrained eigenvalue problem in one dimension of a tensor while solving unconstrained eigenvalue problems in additional dimensions of the tensor.
摘要:
A computer-implemented word processing system comprises an interface component that receives a features vector associated with an electronic document. An analysis component communicatively coupled to the interface component analyzes the features vector and determines a viewing mode in which to display the electronic document. In accordance with one aspect of the subject invention, the viewing mode can be one of a conventional viewing mode and a viewing mode associated with enhanced readability.