摘要:
Embodiments are directed to an object detection system having at least one processor circuit configured to receive a series of image regions and apply to each image region in the series a detector, which is configured to determine a presence of a predetermined object in the image region. The object detection system performs a method of selecting and applying the detector from among a plurality of foreground detectors and a plurality of background detectors in a repeated pattern that includes sequentially selecting a selected one of the plurality of foreground detectors; sequentially applying the selected one of the plurality of foreground detectors to one of the series of image regions until all of the plurality of foreground detectors have been applied; selecting a selected one of the plurality of background detectors; and applying the selected one of the plurality of background detectors to one of the series of image regions.
摘要:
Foreground feature data and motion feature data is determined for frames of video data acquired from a train track area region of interest. The frames are labeled as “train present” if the determined foreground feature data value meets a threshold value, else as “train absent; and as “motion present” if the motion feature data meets a motion threshold, else as “static.” The labels are used to classify segments of the video data comprising groups of consecutive video frames, namely as within a “no train present” segment for groups with “train absent” and “static” labels; within a “train present and in transition” segment for groups “train present” and “motion present” labels; and within a “train present and stopped” segment for groups with “train present” and “static” labels. The presence or motion state of a train at a time of inquiry is thereby determined from the respective segment classification.
摘要:
Techniques for object detection are provided that employ limited learned attribute ranges. One or more objects are initially detected for a full range of one or more attributes at each location of an image. Thereafter, a set of positional constraints are generated indicating an expected range of values for each position in the image for one or more of the attributes based on the detected objects employing a geometric model of a scene in the image. Objects are then detected in the image using the expected range of values for each position in the image for the one or more the attributes. The attributes comprise, for example, one or more of size, pose and rotation of the objects. A best fit is computed to the geometric model to generate the set of positional constraints, for example, using a least squares approach.
摘要:
A method and system for real time processing of a sequence of video frames. A current frame in the sequence and at least one frame in the sequence occurring prior to the current frame is analyzed. Each frame includes a two-dimensional array of pixels. The sequence of video frames is received in synchronization with a recording of the video frames in real time. The analyzing includes performing a background subtraction on the at least one frame, which determines a background image and a static region mask associated with a static region consisting of a contiguous distribution of pixels in the current frame. The static region mask identifies each pixel in the static region upon the static region mask being superimposed on the current frame. A determination is made that a persistence requirement, both a non-persistence duration requirement and a persistence duration requirement, or a combination thereof have been satisfied.
摘要:
A method and system for real time processing of a sequence of video frames. A current frame in the sequence and at least one frame in the sequence occurring prior to the current frame is analyzed. The sequence of video frames is received in synchronization with a recording of the video frames in real time. The analyzing includes performing a background subtraction on the at least one frame, which determines a background image and a static region mask associated with a static region consisting of a contiguous distribution of pixels in the current frame. The static region mask identifies each pixel in the static region upon the static region mask being superimposed on the current frame. A status of a static object is determined as either an abandoned status if the static object is an abandoned object or a removed status if the static object is a removed object.
摘要:
Images are retrieved and ranked according to relevance to attributes of a multi-attribute query through training image attribute detectors for different attributes annotated in a training dataset. Pair-wise correlations are learned between pairs of the annotated attributes from the training dataset of images. Image datasets may are searched via the trained attribute detectors for images comprising attributes in a multi-attribute query. The retrieved images are ranked as a function of comprising attributes that are not within the query subset plurality of attributes but are paired to one of the query subset plurality of attributes by the pair-wise correlations, wherein the ranking is an order of likelihood that the different ones of the attributes will appear in an image with the paired one of the query subset plurality of attributes.
摘要:
Automatic object retrieval from input video is based on learned, complementary detectors created for each of a plurality of different motionlet clusters. The motionlet clusters are partitioned from a dataset of training vehicle images as a function of determining that vehicles within each of the scenes of the images in each cluster share similar two-dimensional motion direction attributes within their scenes. To train the complementary detectors, a first detector is trained on motion blobs of vehicle objects detected and collected within each of the training dataset vehicle images within the motionlet cluster via a background modeling process; a second detector is trained on each of the training dataset vehicle images within the motionlet cluster that have motion blobs of the vehicle objects but are misclassified by the first detector; and the training repeats until all of the training dataset vehicle images have been eliminated as false positives or correctly classified.
摘要:
Objects within two-dimensional video data are modeled by three-dimensional models as a function of object type and motion through manually calibrating a two-dimensional image to the three spatial dimensions of a three-dimensional modeling cube. Calibrated three-dimensional locations of an object in motion in the two-dimensional image field of view of a video data input are determined and used to determine a heading direction of the object as a function of the camera calibration and determined movement between the determined three-dimensional locations. The two-dimensional object image is replaced in the video data input with an object-type three-dimensional polygonal model having a projected bounding box that best matches a bounding box of an image blob, the model oriented in the determined heading direction. The bounding box of the replacing model is then scaled to fit the object image blob bounding box, and rendered with extracted image features.
摘要:
Foreground object image features are extracted from input video via application of a background subtraction mask, and optical flow image features from a region of the input video image data defined by the extracted foreground object image features. If estimated movement features indicate that the underlying object is in motion, a dominant moving direction of the underlying object is determined. If the dominant moving direction is parallel to an orientation of the second, crossed thoroughfare, an event alarm indicating that a static object is blocking travel on the crossing second thoroughfare is not generated. If the estimated movement features indicate that the underlying object is static, or that its determined dominant moving direction is not parallel to the second thoroughfare, an appearance of the foreground object region is determined and a static-ness timer run while the foreground object region comprises the extracted foreground object image features.
摘要:
View-specific object detectors are learned as a function of scene geometry and object motion patterns. Motion directions are determined for object images extracted from a training dataset and collected from different camera scene viewpoints. The object images are categorized into clusters as a function of similarities of their determined motion directions, the object images in each cluster are acquired from the same camera scene viewpoint. Zenith angles are estimated for object image poses in the clusters relative to a position of a horizon in the cluster camera scene viewpoint, and azimuth angles of the poses as a function of a relation of the determined motion directions of the clustered images to the cluster camera scene viewpoint. Detectors are thus built for recognizing objects in input video, one for each of the clusters, and associated with the estimated zenith angles and azimuth angles of the poses of the respective clusters.