Abstract:
A method for determining an emotional state of a subject taking an assessment. The method includes eliciting predicted facial expressions from a subject administered questions each intended to elicit a certain facial expression that conveys a baseline characteristic of the subject; receiving a video sequence capturing the subject answering the questions; determining an observable physical behavior experienced by the subject across a series of frames corresponding to the sample question; associating the observed behavior with the emotional state that corresponds with the facial expression; and training a classifier using the associations. The method includes receiving a second video sequence capturing the subject during an assessment and applying features extracted from the second image data to the classifier for determining the emotional state of the subject in response to an assessment item administered during the assessment.
Abstract:
An apparatus, method and non-transitory computer readable medium for mosaicking a plurality of images captured in a low-light condition are disclosed. For example, the apparatus includes an image capturing module, a flash module, a computer-readable medium and a processor. The processor executes a plurality of instructions stored on the computer-readable medium. The operations include determining a spacing between each one of the plurality of images, capturing the plurality of images in accordance with the spacing, aligning the plurality of images that are captured and mosaicking the plurality of images that are aligned into a single image by, for each one of the plurality of images, replacing one or more pixels in the respective flash spot region using one or more pixels in a subsequent image that is in a same location as the respective flash spot region such that the information is recovered.
Abstract:
A mobile electronic device processes a sequence of images to identify and re-identify an object of interest in the sequence. An image sensor of the device, receives a sequence of images. The device detects an object in a first image as well as positional parameters of the device that correspond to the object in the first image. The device determines a range of positional parameters within which the object may appear in a field of view of the device. When the device detects that the object of interest exited the field of view it subsequently uses motion sensor data to determine that the object of interest has likely re-entered the field of view, it will analyze the current frame to confirm that the object of interest has re-entered the field of view.
Abstract:
The embodiments include systems and methods for guiding a user to capture two flash images of a document page, and selectively fuse the images to produce a binary image of high quality and without loss of any content. Each individual image may have an FSR where the content is degraded/lost due to the flash light. The idea is to first guide the user to take two images such that there is no overlap of flash-spots in the document regions. The flash spots in both images are detected and assessed for quality and extent of degradation in both images. The image with lower degradation is chosen as the primary image and the other image as secondary, to minimize fusing artifacts. The region in secondary image corresponding to the FSR in the primary is aligned to the primary region using a multiscale alignment technique. The primary image and aligned FSR are binarized and fused in the vicinity of the flashspot in the primary using an intelligent technique that minimizes fusion boundary artifacts such as cutting of characters and words.
Abstract:
A method and system for identifying content relevance comprises acquiring video data, mapping the acquired video data to a feature space to obtain a feature representation of the video data, assigning the acquired video data to at least one action class based on the feature representation of the video data, and determining a relevance of the acquired video data.
Abstract:
A method and system for domain adaptation based on multi-layer fusion in a convolutional neural network architecture for feature extraction and a two-step training and fine-tuning scheme. The architecture concatenates features extracted at different depths of the network to form a fully connected layer before the classification step. First, the network is trained with a large set of images from a source domain as a feature extractor. Second, for each new domain (including the source domain), the classification step is fine-tuned with images collected from the corresponding site. The features from different depths are concatenated with and fine-tuned with weights adjusted for a specific task. The architecture is used for classifying high occupancy vehicle images.
Abstract:
A method, computer readable medium and apparatus for verifying an identity of an individual based upon facial expressions as exhibited in a query video of the individual are disclosed. The method includes receiving a reference video for each one of a plurality of different individuals, wherein a plurality of facial gesture encoders is extracted from at least one frame of the reference video describing one or more facial expressions of each one of the plurality of different individuals, receiving the query video, calculating a similarity score for the reference video for the each one of the plurality of different individuals based on an analysis that compares the plurality of facial gesture encoders of the at least one frame of the reference video for the each one of the plurality of different individuals to a plurality of facial gesture encoders extracted from at least one frame of the query video.
Abstract:
A mobile electronic device processes a sequence of images to identify and re-identify an object of interest in the sequence. An image sensor of the device, receives a sequence of images. The device detects an object in a first image as well as positional parameters of the device that correspond to the object in the first image. The device determines a range of positional parameters within which the object may appear in a field of view of the device. When the device detects that the object of interest exited the field of view it subsequently uses motion sensor data to determine that the object of interest has likely re-entered the field of view, it will analyze the current frame to confirm that the object of interest has re-entered the field of view.
Abstract:
Methods, systems, and processor-readable media for training data augmentation. A source domain and a target domain are provided, and thereafter an operation is performed to augment data in the source domain with transformations utilizing characteristics learned from the target domain. The augmented data is then used to improve image classification accuracy in a new domain.
Abstract:
Methods, systems, and processor-readable media for training data augmentation. A source domain and a target domain are provided, and thereafter an operation is performed to augment data in the source domain with transformations utilizing characteristics learned from the target domain. The augmented data is then used to improve image classification accuracy in a new domain.