摘要:
Systems and methods for summarizing a video assign frames in a video to at least one of two or more groups based on a topic, generate a respective first similitude measurement for the frames in a group relative to the other frames in the group based on a feature, rank the frames in a group relative to one or more other frames in the group based on the respective first similitude measurement of the respective frames, and select a frame from each group as a most-representative frame based on the respective rank of the frames in a group relative to the other frames in the group.
摘要:
Converting a digital image from color to gray-scale. In one example embodiment, a method for converting a digital image from color to gray-scale is disclosed. First, an unconverted pixel having red, green, and blue color channels is selected from the color digital image. Next, the red color channel of the pixel is multiplied by α. Then, the green color channel of the pixel is multiplied by β. Next, the blue color channel of the pixel is multiplied by γ. Then, the results of the three multiplication operations are added together to arrive at a gray-scale value for the pixel. Finally, these acts are repeated for each remaining unconverted pixel of the color digital image to arrive at a gray-scale digital image. In this example method, α+β+≈1 and α>β.
摘要:
Systems and methods for training an AdaBoost based classifier for detecting symmetric objects, such as human faces, in a digital image. In one example embodiment, such a method includes first selecting a sub-window of a digital image. Next, the AdaBoost based classifier extracts multiple sets of two symmetric scalar features from the sub-window, one being in the right half side and one being in the left half side of the sub-window. Then, the AdaBoost based classifier minimizes the joint error of the two symmetric features for each set of two symmetric scalar features. Next, the AdaBoost based classifier selects one of the features from the set of two symmetric scalar features for each set of two symmetric scalar features. Finally, the AdaBoost based classifier linearly combines multiple weak classifiers, each of which corresponds to one of the selected features, into a stronger classifier.
摘要:
A method and system for efficiently detecting faces within a digital image. One example method includes identifying a digital image comprised of a plurality of sub-windows and performing a first scan of the digital image using a coarse detection level to eliminate the sub-windows that have a low likelihood of representing a face. The subset of the sub-windows that were not eliminated during the first scan are then scanned a second time using a fine detection level having a higher accuracy level than the coarse detection level used during the first scan to identify sub-windows having a high likelihood of representing a face.
摘要:
Methods and systems for gesture-based control of a device are described. A virtual gesture-space is determined in a received input frame. The virtual gesture-space is associated with a primary user from a ranked user list of users. The received input frame is processed in only the virtual gesture-space, to detect and track a hand. Using a hand bounding box generated by detecting and tracking the hand, gesture classification is performed to determine a gesture input associated with the hand. A command input associated with the determined gesture input is processed. The device may be a smart television, a smart phone, a tablet, etc.
摘要:
A method, device and computer-readable medium for generating a super-resolution version of a compressed video stream. By leveraging the motion information and residual information in compressed video streams, described examples are able to skip the time-consuming motion-estimation step for most frames and make the most use of the SR results of key frames. A key frame SR module generates SR versions of I-frames and other key frames of a compressed video stream using techniques similar to existing multi-frame approaches to VSR. A non-key frame SR module generates SR version of the non-key inter frames between these key frames by making use of motion information and residual information used to encode the inter frames in the compressed video stream.
摘要:
Systems and methods for multi-frame video frame interpolation. Higher-order motion modeling, such as cubic motion modeling, achieves predictions of intermediate optical flow between multiple interpolated frames, assisted by relaxation of the constraints imposed by the loss function used in initial optical flow estimation. A temporal pyramidal optical flow refinement module performs coarse-to-fine refinement of the optical flow maps used to generate the intermediate frames, focusing a proportionally greater amount of refinement attention to the optical flow maps for the high-error middle frames. A temporal pyramidal pixel refinement module performs coarse-to-fine refinement of the generated intermediate frames, focusing a proportionally greater amount of refinement attention to the high-error middle frames. A generative adversarial network (GAN) module calculates a loss function for training the neural networks used in the optical flow estimation module, temporal pyramidal optical flow refinement module, and/or temporal pyramidal pixel refinement module.
摘要:
Methods and apparatus for gesture-based control of a device in a multi-user environment are described. The methods prioritize users or gestures based on a predetermined priority ruleset. A first-user-in-time ruleset prioritizes gestures based on when in time they were begun by a user in the camera FOV. An action-hierarchy ruleset prioritizes gestures based on the actions they correspond to, and the relative positions of those actions within an action hierarchy. A designated-master-user ruleset prioritizes gestures performed by an explicitly designated master user. Methods for designating a new master user and for providing gesture-control-related user feedback in a multi-user environment are also described.
摘要:
Methods and apparatus for gesture-based control of a device in a multi-user environment are described. The methods prioritize users or gestures based on a predetermined priority ruleset. A first-user-in-time ruleset prioritizes gestures based on when in time they were begun by a user in the camera FOV. An action-hierarchy ruleset prioritizes gestures based on the actions they correspond to, and the relative positions of those actions within an action hierarchy. A designated-master-user ruleset prioritizes gestures performed by an explicitly designated master user. Methods for designating a new master user and for providing gesture-control-related user feedback in a multi-user environment are also described.
摘要:
Methods and systems for video segmentation and scene recognition are described. A video having a plurality of frames and a subtitle file associated with the video are received. Segmentation is performed on the video to generate a first set video frames comprising one or more video frames based on a frame-by-frame comparison of features in the frames of the video. Each video frame in the first includes a frame indicator which indicates at least a first start frame of the video frame. The subtitle file associated with the video is parsed to generate one or more subtitle segments based on a start and an end time of each dialogue in the subtitle file. A second set of video frames comprising one or more second video frames are generated based on the video frames of the first set of video frames and the e or more subtitle segments.