Abstract:
A method, computer readable storage medium, and system are disclosed for improving communication productivity, comprising: capturing at least one three-dimensional (3D) stream of data on two or more subjects; extracting a time-series of skeletal data from the at least one 3D stream of data on the two or more subjects; and determining an engagement index between the two or more subjects by comparing the time-series of skeletal data on each of the two or more subjects over a time window.
Abstract:
A method, a system, and a non-transitory computer readable medium for recognizing an object are disclosed, the method including: emitting an array of infrared rays from an infrared emitter towards a projection region, the projection region including a first object; generating a reference infrared image by recording an intensity of ray reflection from the projection region without the first object; generating a target infrared image by recording the intensity of ray reflection from the projection region with the first object; comparing the target infrared image to the reference infrared image to generate a predetermined intensity threshold; and extracting the first object from the target infrared image, if the intensity of ray reflection of the target infrared image of the first object exceeds the predetermined intensity threshold.
Abstract:
A method for 3D gesture behavior recognition is disclosed, which includes detecting a behavior change of one or more attendees at a meeting and/or conference; classifying the behavior change; and performing an action based on the behavior change of the one or more attendees. Another method, system and computer readable medium for 3D gesture behavior recognition as disclosed, includes obtaining temporal segmentation of human motion sequences for one or more attendees; determining a probability density function of the temporal segmentations of the human motion sequences using a Parzen window density estimation model; computing a bandwidth for determination of a median absolute deviation; updating the Parzen window to adapt for changes in the motion sequences for the one or more attendees; and detecting actions based.
Abstract:
An artificial neural network system for image classification, including multiple independent individual convolutional neural networks (CNNs) connected in multiple stages, each CNN configured to process an input image to calculate a pixelwise classification. The output of an earlier stage CNN, which is a class score image having identical height and width as its input image and a depth of N representing the probabilities of each pixel of the input image belonging to each of N classes, is input into the next stage CNN as input image. When training the network system, the first stage CNN is trained using first training images and corresponding label data; then second training images are forward propagated by the trained first stage CNN to generate corresponding class score images, which are used along with label data corresponding to the second training images to train the second stage CNN.
Abstract:
A method, system and non-transitory computer readable medium are disclosed for recognizing gestures, the method includes capturing at least one three-dimensional (3D) video stream of data on a subject; extracting a time-series of skeletal data from the at least one 3D video stream of data; isolating a plurality of points of abrupt content change called temporal cuts, the plurality of temporal cuts defining a set of non-overlapping adjacent segments partitioning the time-series of skeletal data; identifying among the plurality of temporal cuts, temporal cuts of the time-series of skeletal data having a positive acceleration; and classifying each of the one or more pair of consecutive cuts with the positive acceleration as a gesture boundary.
Abstract:
A method and system are disclosed for recognizing an object, the method including emitting one or more arranged patterns of infrared rays (IR) from an infrared emitter towards a projection region, the one or more arranged patterns of infrared rays forming unique dot patterns; mapping the one or more arranged patterns of infrared rays on the operation region to generate a reference image; capturing an IR image and a RGB image of an object with a wearable device, the wearable device including an infrared (IR) camera and a RGB camera; extracting IR dots from the IR image and determining a match between the extracted IR dots and the reference image; determining a position of the RGB image on the reference image; and mapping the position of the RGB image to a coordinate on the projection region.
Abstract:
A text recognition method and system involves computing a text matching score between an input text and an output candidate text. The text matching score is computed by evaluating respective N-grams of the input text and the output candidate text. The N-grams are compared in pairs for visual similarity by determining N-gram pair scores, which are used to compute the text matching score. The N-gram pair scores are determined using a set of probabilities of confusion between characters contained in the N-grams. The described approach can address inconsistent results that arise from conventional text similarity quantifiers.
Abstract:
A method and system for recognizing behavior is disclosed, the method includes: capturing at least one video stream of data on one or more subjects; extracting body skeleton data from the at least one video stream of data; computing feature extractions on the extracted body skeleton data to generate a plurality of 3 dimensional delta units for each frame of the extracted body skeleton data; generating a plurality of histogram sequences for each frame by projecting the plurality of 3 dimensional delta units for each frame to a spherical coordinate system having a plurality of spherical bins; generating an energy map for each of the plurality of histogram sequences by mapping the plurality of spherical bins versus time; applying a Histogram of Oriented Gradients (HOG) algorithm on the plurality of energy maps to generate a single column vector; and classifying the single column vector as a behavior and/or emotion.
Abstract:
An artificial neural network system for image classification, formed of multiple independent individual convolutional neural networks (CNNs), each CNN being configured to process an input image patch to calculate a classification for the center pixel of the patch. The multiple CNNs have different receptive field of views for processing image patches of different sizes centered at the same pixel. A final classification for the center pixel is calculated by combining the classification results from the multiple CNNs. An image patch generator is provided to generate the multiple input image patches of different sizes by cropping them from the original input image. The multiple CNNs have similar configurations, and when training the artificial neural network system, one CNN is trained first, and the learned parameters are transferred to another CNN as initial parameters and the other CNN is further trained. The classification includes three classes, namely background, foreground, and edge.
Abstract:
A method, computer readable medium, and system are disclosed of enhancing cell images for analysis. The method includes performing a multi-thresholding process on a cell image to generate a plurality of images of the cell image; smoothing each component within each of the plurality of images; merging the smoothed components into a merger layer; classifying each of the components of the merged layer into convex cell regions and concave cell regions; combining the concave cell regions with a cell boundary for each of the corresponding concave cell regions to generate a smoothed shape profile for each of the concave cell regions; and generating an output image by combining the convex cell regions with the concave cell regions with smoothed shape profiles.