Abstract:
Described is a system for visual activity recognition that includes one or more processors and a memory, the memory being a non-transitory computer-readable medium having executable instructions encoded thereon, such that upon execution of the instructions, the one or more processors perform operations including detecting a set of objects of interest in video data and determining an object classification for each object in the set of objects of interest, the set including at least one object of interest. The one or more processors further perform operations including forming a corresponding activity track for each object in the set of objects of interest by tracking each object across frames. The one or more processors further perform operations including, for each object of interest and using a feature extractor, determining a corresponding feature in the video data. The system may provide a report to a user's cell phone or central monitoring facility.
Abstract:
Described is a system for converting a convolutional neural network (CNN) designed and trained for color (RGB) images to one that works on infrared (IR) or grayscale images. The converted CNN comprises a series of convolution layers of neurons arranged in a set kernels having corresponding depth slices. The converted CNN is used for performing object detection. A mechanical component of an autonomous device is controlled based on the object detection.
Abstract:
Described is an object recognition system. Using an integral channel features (ICF) detector, the system extracts a candidate target region (having an associated original confidence score representing a candidate object) from an input image of a scene surrounding a platform. A modified confidence score is generated based on a location and height of detection of the candidate object. The candidate target regions are classified based on the modified confidence score using a trained convolutional neural network (CNN) classifier, resulting in classified objects. The classified objects are tracked using a multi-target tracker for final classification of each classified object as a target or non-target. If the classified object is a target, a device can be controlled based on the target.
Abstract:
Described is a system for real-time object recognition. During operation, the system extracts convolutional neural network (CNN) feature vectors from an input image. The input image reflects a scene proximate the system, with the feature vector representing an object in the input image. The CNN feature vector is matched against feature vectors stored in a feature dictionary to identify k nearest neighbors for each object class stored in the feature dictionary. The matching results in a probability distribution over object classes stored in the feature dictionary. The probability distribution provides a confidence score that each of the object classes in the feature dictionary are representative of the object in the input image. Based on the confidence scores, the object in the input image is then recognized as being a particular object class when the confidence score for the particular object class exceeds a threshold.
Abstract:
Described is a system for collision detection and avoidance estimation using sub-region based optical flow. During operation, the system estimates time-to-contact (TTC) values for an obstacle in multiple regions-of-interest (ROI) in successive image frames as obtained from a monocular camera. Based on the TTC values, the system detects if there is an imminent obstacle. If there is an imminent obstacle, a path for avoiding the obstacle is determined based on the TTC values in the multiple ROI. Finally, a mobile platform is caused to move in the path as determined to avoid the obstacle.
Abstract:
Described is a system for object detection in images or videos using spiking neural networks. An intensity saliency map is generated from an intensity of an input image having color components using a spiking neural network. Additionally, a color saliency map is generated from a plurality of colors in the input image using a spiking neural network. An object detection model is generated by combining the intensity saliency map and multiple color saliency maps. The object detection model is used to detect multiple objects of interest in the input image.
Abstract:
Described is a system for visual activity recognition that includes one or more processors and a memory, the memory being a non-transitory computer-readable medium having executable instructions encoded thereon, such that upon execution of the instructions, the one or more processors perform operations including detecting a set of objects of interest in video data and determining an object classification for each object in the set of objects of interest, the set including at least one object of interest. The one or more processors further perform operations including forming a corresponding activity track for each object in the set of objects of interest by tracking each object across frames. The one or more processors further perform operations including, for each object of interest and using a feature extractor, determining a corresponding feature in the video data. The system may provide a report to a user's cell phone or central monitoring facility.
Abstract:
Described is system for real-time automated exposure adjustment of a camera using contrast entropy. Multiple images are captured using an image sensor, where each of the images is captured at a distinct exposure value. A contrast entropy is determined for each of the multiple images. An image having the largest contrast entropy is selected among the images and output.
Abstract:
Described is a system for improving object recognition. Object detection results and classification results for a sequence of image frames are received as input. Each object detection result is represented by a detection box and each classification result is represented by an object label corresponding to the object detection result. A pseudo-tracklet is formed by linking object detection results representing the same object in consecutive image frames. The system determines whether there are any inconsistent object labels or missing object detection results in the pseudo-tracklet. Finally, the object detection results and the classification results are improved by correcting any inconsistent object labels and missing object detection results.
Abstract:
Described is a system for selective color processing for vision systems. The system receives a multi-band image as input. As an optional step, the multi-band image is preprocessed, and a transformation is performed to transform the multi-band image into a color space. A metric function is applied to the transformed image to generate a distance map comprising intensities which vary based on a similarity between an intensity of a pixel color and an intensity of a target color. A contrast enhancement process is applied to the distance map to normalize the distance map to a range of values. The range of values is expanded near the intensity of the target color. Finally, an output response map for the target color of interest is generated, such that the output response map has high responses in regions which are similar to the target color to aid in detection and recognition processes.