Abstract:
A method for training an automated classifier of input images includes: receiving, by a processing device, a convolution neural network (CNN) model; receiving, by the processing device, training images and corresponding classes, each of the corresponding classes being associated with several ones of the training images; preparing, by the processing device, the training images, including separating the training images into a training set of the training images and a testing set of the training images; and training, by the processing device, the CNN model utilizing the training set, the testing set, and the corresponding classes to generate the automated classifier.
Abstract:
Described is a system for converting convolutional neural networks to spiking neural networks. A convolutional neural network (CNN) is adapted to fit a set of requirements of a spiking neural network (SNN), resulting in an adapted CNN. The adapted CNN is trained to obtain a set of learned weights, and the set of learned weights is then applied to a converted SNN having an architecture similar to the adapted CNN. The converted SNN is then implemented on neuromorphic hardware, resulting in reduced power consumption.
Abstract:
Described is a system for visual activity recognition. In operation, the system detects a set of objects of interest (OI) in video data and determines an object classification for each object in the set of OI, the set including at least one OI. A corresponding activity track is formed for each object in the set of OI by tracking each object across frames. Using a feature extractor, the system determines a corresponding feature in the video data for each OI, which is then used to determine a corresponding initial activity classification for each OI. One or more OI are then detected in each activity track via foveation, with the initial object detection and foveated object detection thereafter being appended into a new detected-objects list. Finally, a final classification is provided for each activity track using the new detected-objects list and filtering the initial activity classification results using contextual logic.
Abstract:
Described is a system for location recognition for mobile platforms, such as autonomous robotic exploration. In operation, an image in front of the platform is converted into a high-dimensional feature vector. The image reflects a scene proximate the mobile platform. A candidate location identification of the scene is then determined. The candidate location identification is then stored in a history buffer. Upon receiving a cue, the system then determines if the candidate location identification is a known location or a new location.
Abstract:
Described is a system and method for accurate image and/or video scene classification. More specifically, described is a system that makes use of a specialized convolutional-neural network (hereafter CNN) based technique for the fusion of bottom-up whole-image features and top-down entity classification. When the two parallel and independent processing paths are fused, the system provides an accurate classification of the scene as depicted in the image or video.
Abstract:
Described is a system for converting a convolutional neural network (CNN) designed and trained for color (RGB) images to one that works on infrared (IR) or grayscale images. The converted CNN comprises a series of convolution layers of neurons arranged in a set kernels having corresponding depth slices. The converted CNN is used for performing object detection. A mechanical component of an autonomous device is controlled based on the object detection.
Abstract:
Described is a system for detecting moving objects using multi-frame motion history images. An input video sequence of consecutive registered image frames is received. The sequence of consecutive registered image frames comprises forward and backward registered image frames registered to a coordinate system of a reference image frame. Frame differences are computed between each of the consecutive registered image frames and the reference image frame. The frame differences are accumulated based on characteristics of the input video sequence to compute a corresponding motion response value. A selected threshold value is then applied to the motion response value to produce at least one binary image used for detection of moving objects in the input video sequence. Additionally, the invention includes a system for adaptive parameter optimization by input image characterization, wherein parameters that are based on characteristics of the image influence the motion detection process.
Abstract:
Described is a system for object detection in images or videos using spiking neural networks. An intensity saliency map is generated from an intensity of an input image having color components using a spiking neural network. Additionally, a color saliency map is generated from a plurality of colors in the input image using a spiking neural network. An object detection model is generated by combining the intensity saliency map and multiple color saliency maps. The object detection model is used to detect multiple objects of interest in the input image.
Abstract:
Described is a system for object tracking with integrated motion-based object detection and enhanced Kalman-type filtering. The system detects a location of a moving object in an image frame using an object detection MogS module, thereby generating an object detection. For each image frame in a sequence of image frames, the system predicts the location of the moving object in the next image frame using a Kalman filter prediction module to generate a predicted object location. The predicted object location is refined using a Kalman filter updating module, and the Kalman filter updating module is controlled by a controller module that monitors a similarity between the predicted object location and the moving object's location in a previous image frame. Finally, a set of detected moving object locations in the sequence of image frames is output.
Abstract:
Described is method for object cueing in motion imagery. Key points and features are extracted from motion imagery, and features between consecutive image frames of the motion imagery are compared to identify similar image frames. A candidate set of matching keypoints is generated by matching keypoints between the similar image frames. A ground plane homography model that fits the candidate set of matching keypoints is determined to generate a set of correct matching keypoints. Each image frame of a set of image frames within a selected time window is registered into a reference frame's coordinate system using the homography transformation. A difference image is obtained between the reference frame and each registered image frame, resulting in multiple difference images. The difference images are then accumulated to calculate a detection image which is used for detection of salient regions. Object cues for surveillance use are produced based on the detected salient regions.