摘要:
Methods and apparatus for gesture-based control of a device in a multi-user environment are described. The methods prioritize users or gestures based on a predetermined priority ruleset. A first-user-in-time ruleset prioritizes gestures based on when in time they were begun by a user in the camera FOV. An action-hierarchy ruleset prioritizes gestures based on the actions they correspond to, and the relative positions of those actions within an action hierarchy. A designated-master-user ruleset prioritizes gestures performed by an explicitly designated master user. Methods for designating a new master user and for providing gesture-control-related user feedback in a multi-user environment are also described.
摘要:
Systems and methods for generating a visual vocabulary build a plurality of visual words via unsupervised learning on set of features of a given type; decompose one or more visual words to a collection of lower-dimensional buckets; generate labeled image representations based on the collection of lower dimensional buckets and labeled images, wherein labels associated with an image are associated with a respective representation of the image; and iteratively select a sub-collection of buckets from the collection of lower-dimensional buckets based on the labeled image representations, wherein bucket selection during any iteration after an initial iteration is based at least in part on feedback from previously selected buckets.
摘要:
The present invention is directed to a computer automated method of selectively identifying a user specified behavior of a crowd. The method comprises receiving video data but can also include audio data and sensor data. The video data contains images a crowd. The video data is processed to extract hierarchical human and crowd features. The detected crowd features are processed to detect a selectable crowd behavior. The selected crowd behavior detected is specified by a configurable behavior rule. Human detection is provided by a hybrid human detector algorithm which can include Adaboost or convolutional neural network. Crowd features are detected using textual analysis techniques. The configurable crowd behavior for detection can be defined by crowd behavioral language.
摘要:
Converting a digital image from color to gray-scale. In one example embodiment, a method for converting a digital image from color to gray-scale is disclosed. First, an unconverted pixel having red, green, and blue color channels is selected from the color digital image. Next, the red color channel of the pixel is multiplied by α. Then, the green color channel of the pixel is multiplied by β. Next, the blue color channel of the pixel is multiplied by γ. Then, the results of the three multiplication operations are added together to arrive at a gray-scale value for the pixel. Finally, these acts are repeated for each remaining unconverted pixel of the color digital image to arrive at a gray-scale digital image. In this example method, α+β+≈1 and α>β.
摘要:
Methods, devices, and processor-readable media for adjusting the control-display gain of a gesture-controlled device are described. Adjusting the control-display gain may facilitate user interaction with content or UI elements rendered on a display screen of the gesture-controlled device. The control-display gain may be adjusted based on a property of how a mid-air dragging gesture is being performed by a user's hand. The property may be the location of the gesture, the orientation of the hand performing the gesture, or the velocity of the gesture. A hand that becomes stationary for a threshold time period while performing the dragging gesture may adjust the control-display gain to a different level. Control-display gain may be set to a different value based on the current velocity of the hand performing the gesture. The control-display gain levels may be selected from a continuous range of values or a set of discrete values. Devices for performing the methods are described.
摘要:
Systems and methods for generating a visual vocabulary build a plurality of visual words via unsupervised learning on set of features of a given type; decompose one or more visual words to a collection of lower-dimensional buckets; generate labeled image representations based on the collection of lower dimensional buckets and labeled images, wherein labels associated with an image are associated with a respective representation of the image; and iteratively select a sub-collection of buckets from the collection of lower-dimensional buckets based on the labeled image representations, wherein bucket selection during any iteration after an initial iteration is based at least in part on feedback from previously selected buckets.
摘要:
Systems and methods for clustering descriptors in a space of visual descriptors to generate augmented visual descriptors in an augmented space that includes semantic information, wherein the augmented space of the augmented descriptors includes both visual descriptor-to-descriptor dissimilarities and semantic label-to-label dissimilarities; and cluster the augmented visual descriptors in the augmented space based at least in part on a dissimilarity measure between augmented visual descriptors in the augmented descriptor space.
摘要:
Systems and methods for summarizing a video assign frames in a video to at least one of two or more groups based on a topic, generate a respective first similitude measurement for the frames in a group relative to the other frames in the group based on a feature, rank the frames in a group relative to one or more other frames in the group based on the respective first similitude measurement of the respective frames, and select a frame from each group as a most-representative frame based on the respective rank of the frames in a group relative to the other frames in the group.
摘要:
A method and system for scanning a digital image for detecting the representation of an object, such as a face, and for reducing memory requirements of the computer system performing the image scan. One example method includes identifying an original image and downsamples the original image in an x-dimension and in a y-dimension to obtain a downsampled image that requires less storage space than the original digital image. A first scan is performed of the downsampled image to detect the representation of an object within the downsampled image. Then, the original digital image is divided into at least two image blocks, where each image block contains a portion of the original digital image. A second scan is then performed of each of the image blocks to detect the representation of the object within the image blocks.
摘要:
Methods for estimating a point spread function of a blurred digital image. One example method includes capturing gyro data during an image exposure time, deriving gyro samples from the gyro data at predetermined gyro sampling times, calculating a motion vector field of the image at each gyro sampling time, approximating an overall image scene motion path by averaging motion paths of selected pixels in the image, and estimating the point spread function from the approximated overall image scene motion path.