摘要:
A system for organizing images includes an extraction component that extracts visual information (e.g., faces, scenes, etc.) from the images. The extracted visual information is provided to a comparison component which computes similarity confidence data between the extracted visual information. The similarity confidence data is an indication of the likelihood that items of extracted visual information are similar. The comparison component then generates a visual distribution of the extracted visual information based upon the similarity confidence data. The visual distribution can include groupings of the extracted visual information based on computed similarity confidence data. For example, the visual distribution can be a two-dimensional layout of faces organized based on the computed similarity confidence data—with faces in closer proximity faces computed to have a greater probability of representing the same person. The visual distribution can then be utilized by a user to sort, organize and/or tag images.
摘要:
A system for organizing images includes an extraction component that extracts visual information (e.g., faces, scenes, etc.) from the images. The extracted visual information is provided to a comparison component which computes similarity confidence data between the extracted visual information. The similarity confidence data is an indication of the likelihood that items of extracted visual information are similar. The comparison component then generates a visual distribution of the extracted visual information based upon the similarity confidence data. The visual distribution can include groupings of the extracted visual information based on computed similarity confidence data. For example, the visual distribution can be a two-dimensional layout of faces organized based on the computed similarity confidence data—with faces in closer proximity faces computed to have a greater probability of representing the same person. The visual distribution can then be utilized by a user to sort, organize and/or tag images.
摘要:
Systems and methods are described for face recognition using discriminatively trained orthogonal rank one tensor projections. In an exemplary system, images are treated as tensors, rather than as conventional vectors of pixels. During runtime, the system designs visual features—embodied as tensor projections—that minimize intraclass differences between instances of the same face while maximizing interclass differences between the face and faces of different people. Tensor projections are pursued sequentially over a training set of images and take the form of a rank one tensor, i.e., the outer product of a set of vectors. An exemplary technique ensures that the tensor projections are orthogonal to one another, thereby increasing ability to generalize and discriminate image features over conventional techniques. Orthogonality among tensor projections is maintained by iteratively solving an ortho-constrained eigenvalue problem in one dimension of a tensor while solving unconstrained eigenvalue problems in additional dimensions of the tensor.
摘要:
In embodiments consistent with the subject matter of this disclosure, a user may input strokes as digital ink to a processing device. The processing device may partition the input strokes into multiple regions of strokes. A first recognizer and a second recognizer may score grammar objects included in regions and represented by chart entries. The scores may be converted to a converted score, which may have at least a near standard normal distribution. The processing device may present a recognition result based on highest converted scores according to a recurrence formula. The processing device may receive a correction hint with respect to misrecognized strokes and may add a penalty score with respect to chart entries representing grammar objects breaking the correction hint. Incremental recognition may be performed when a pause is detected during inputting of strokes.
摘要:
Annotation recognition and parsing is accomplished by first recognizing and grouping shapes such that relationships between the annotations and the underlying text and/or images can be determined. The recognition and grouping is followed by categorization of recognized annotations according to predefined types. The classification may be according to functionality, relation to content, and the like. In a third phase, the annotations are anchored to the underlying text or images they are found to be related to.
摘要:
A method for efficiently comparing two trinary logic representations, including the steps of creating a first data structure (a VALUE data structure) representative of a first set of properties; creating a second data structure (a KNOWN data structure) representative of whether the first set of properties is known; creating a third data structure (a TARGET data structure) representative of a target set of properties; creating a fourth data structure (a WANT data structure) representative of whether the target set of properties is wanted; and comparing the first, second, third, and fourth data structures using bit-wise binary operations to determine whether the first set of known properties are wanted as a target set of properties. In exemplary embodiments, the bit-wise binary operations are performed according to the Boolean equation: (not WANT) or (KNOWN and ((TARGET xor VALUE))). Alternatively, the bit-wise binary operation are performed according to the Boolean equation: (not WANT) or (KNOWN and ((TARGET and VALUE) or ((not TARGET) and (not (VALUE))). These data structures may be any size computer word, including 16 and 32-bit words.
摘要:
A method detects an object, such a face, in an image. The image is first partitioned into patches of various sizes using either an integral image or a Gaussian pyramid. Features in each patch are evaluated to determine a cumulative score. The evaluating is repeated while the cumulative score is within a range of an acceptance threshold and a rejection threshold, and otherwise the image is rejected when the accumulated score is less than the rejection threshold and accepted as including the object when the cumulative score is greater than the acceptance threshold.
摘要:
Systems and methods for detecting people or speakers in an automated fashion are disclosed. A pool of features including more than one type of input (like audio input and video input) may be identified and used with a learning algorithm to generate a classifier that identifies people or speakers. The resulting classifier may be evaluated to detect people or speakers.
摘要:
A two-dimensional representation of a document is leveraged to extract a hierarchical structure that facilitates recognition of the document. The visual structure is grammatically parsed utilizing two-dimensional adaptations of statistical parsing algorithms. This allows recognition of layout structures (e.g., columns, authors, titles, footnotes, etc.) and the like such that structural components of the document can be accurately interpreted. Additional techniques can also be employed to facilitate document layout recognition. For example, grammatical parsing techniques that utilize machine learning, parse scoring based on image representations, boosting techniques, and/or “fast features” and the like can be employed to facilitate in document recognition.
摘要:
A computer-implemented word processing system comprises an interface component that receives a features vector associated with an electronic document. An analysis component communicatively coupled to the interface component analyzes the features vector and determines a viewing mode in which to display the electronic document. In accordance with one aspect of the subject invention, the viewing mode can be one of a conventional viewing mode and a viewing mode associated with enhanced readability.