摘要:
A method and system efficiently and accurately detects humans in a test image and classifies their pose. In a training stage, a probabilistic model is derived in an unsupervised or semi-supervised manner such that at least some poses are not manually labeled. The model provides two sets of model parameters to describe the statistics of images containing humans and images of background scenes. In a testing stage, the probabilistic model is used to determine if a human is present in the image, and classify the human's pose based on the poses in the training images. A solution is efficiently provided to both human detection and pose classification by using the same probabilistic model to solve the problems.
摘要:
Methods and systems are described for three-dimensional pose estimation. A training module determines a mapping function between a training image sequence and pose representations of a subject in the training image sequence. The training image sequence is represented by a set of appearance and motion patches. A set of filters are applied to the appearance and motion patches to extract features of the training images. Based on the extracted features, the training module learns a multidimensional mapping function that maps the motion and appearance patches to the pose representations of the subject. A testing module outputs a fast human pose estimation by applying the learned mapping function to a test image sequence.
摘要:
A method and system efficiently and accurately detects humans in a test image and classifies their pose. In a training stage, a probabilistic model is derived in an unsupervised or semi-supervised manner such that at least some poses are not manually labeled. The model provides two sets of model parameters to describe the statistics of images containing humans and images of background scenes. In a testing stage, the probabilistic model is used to determine if a human is present in the image, and classify the human's pose based on the poses in the training images. A solution is efficiently provided to both human detection and pose classification by using the same probabilistic model to solve the problems.
摘要:
Methods and systems are described for three-dimensional pose estimation. A training module determines a mapping function between a training image sequence and pose representations of a subject in the training image sequence. The training image sequence is represented by a set of appearance and motion patches. A set of filters are applied to the appearance and motion patches to extract features of the training images. Based on the extracted features, the training module learns a multidimensional mapping function that maps the motion and appearance patches to the pose representations of the subject. A testing module outputs a fast human pose estimation by applying the learned mapping function to a test image sequence.
摘要:
Disclosed techniques include receiving an electronic image containing depictions of characters, segmenting at least some of the depictions of characters using a first segmentation technique to produce a first segmented portion, and performing a first character recognition on the first segmented portion to determine a first sequence of characters. The techniques also include determining, based on the performing the first character recognition, that the first sequence of characters does not match the depictions of characters. The techniques further include segmenting at least some of the depictions of characters using a second segmentation technique, based on the determining, to produce a second segmented portion, and performing a second character recognition on at least a portion of the second segmented portion to produce a second sequence of characters. The techniques also include outputting a third sequence of characters based on at least part of the second sequence of characters.
摘要:
A text recognition server is configured to recognize text in a sparse text image. Specifically, given an image, the server specifies a plurality of “patches” (blocks of pixels within the image). The system applies a text detection algorithm to the patches to determine a number of the patches that contain text. This application of the text detection algorithm is used both to estimate the orientation of the image and to determine whether the image is textually sparse or textually dense. If the image is determined to be textually sparse, textual patches are identified and grouped into text regions, each of which is then separately processed by an OCR algorithm, and the recognized text for each region is combined into a result for the image as a whole.
摘要:
Embodiments of this invention relate to detecting and blurring images. In an embodiment, a system detects objects in a photographic image. The system includes an object detector module configured to detect regions of the photographic image that include objects of a particular type at least based on the content of the photographic image. The system further includes a false positive detector module configured to determine whether each region detected by the object detector module includes an object of the particular type at least based on information about the context in which the photographic image was taken.
摘要:
A system and method is provided for automatically recognizing building numbers in street level images. In one aspect, a processor selects a street level image that is likely to be near an address of interest. The processor identifies those portions of the image that are visually similar to street numbers, and then extracts the numeric values of the characters displayed in such portions. If an extracted value corresponds with the building number of the address of interest such as being substantially equal to the address of interest, the extracted value and the image portion are displayed to a human operator. The human operator confirms, by looking at the image portion, whether the image portion appears to be a building number that matches the extracted value. If so, the processor stores a value that associates that building number with the street level image.
摘要:
Systems and methods for selecting interest point descriptors for object recognition. In an embodiment, the present invention estimates performance of local descriptors by (1) receiving a local descriptor relating to an object in a first image; (2) identifying one or more nearest neighbor descriptors relating to one or more images different from the first image, the nearest neighbor descriptors comprising nearest neighbors of the local descriptor; (3) calculating a quality score for the local descriptor based on the number of nearest neighbor descriptors that relate to images showing the object; and (4) determining, on the basis of the quality score, if the local descriptor is effective in identifying the object.
摘要:
A system and method is provided for automatically recognizing building numbers in street level images. In one aspect, a processor selects a street level image that is likely to be near an address of interest. The processor identifies those portions of the image that are visually similar to street numbers, and then extracts the numeric values of the characters displayed in such portions. If an extracted value corresponds with the building number of the address of interest such as being substantially equal to the address of interest, the extracted value and the image portion are displayed to a human operator. The human operator confirms, by looking at the image portion, whether the image portion appears to be a building number that matches the extracted value. If so, the processor stores a value that associates that building number with the street level image.