Abstract:
A method, non-transitory computer-readable medium, and apparatus for localizing a region of interest using a dynamic hand gesture are disclosed. For example, the method captures the ego-centric video containing the dynamic hand gesture, analyzes a frame of the ego-centric video to detect pixels that correspond to a fingertip using a hand segmentation algorithm, analyzes temporally one or more frames of the ego-centric video to compute a path of the fingertip in the dynamic hand gesture, localizes the region of interest based on the path of the fingertip in the dynamic hand gesture and performs an action based on an object in the region of interest.
Abstract:
A method, non-transitory computer readable medium, and apparatus for localizing a region of interest using a hand gesture are disclosed. For example, the method acquires an image containing the hand gesture from the ego-centric video, detects pixels that correspond to one or more hands in the image using a hand segmentation algorithm, identifies a hand enclosure in the pixels that are detected within the image, localizes a region of interest based on the hand enclosure and performs an action based on the object in the region of interest.
Abstract:
A system captures or otherwise receives a video and uses the video to create an electronic file corresponding to a multi-faceted printed artifact, such as a multi-page document. When the system receives the video, it selects a set of some or all of the video's image frames, determines a frame quality for each frame in the set, and identifies a subset of the frames such that the frame quality of each frame in the subset satisfies one or more image quality criteria. The subset will include at least one frame for each facet of the multi-faceted printed artifact, such as a page of the document. The processor then automatically combines the subset of frames into a single electronic file.
Abstract:
Methods and systems are disclosed for updating camera geometric calibration utilizing scene analysis. Geometric calibration parameters can be derived with respect to one or more cameras and selected reference points of interest identified from a scene acquired by one or more of such cameras. The camera geometric calibration parameters can be applied to image coordinates of the selected reference points of interest to provide real-world coordinates at a time of initial calibration of the camera(s). A subset of a video stream from the camera(s) can then be analyzed to identify features of a current scene captured by the camera(s) that match the selected reference points of interest and provide a current update of the camera geometric calibration parameters with respect to the current scene.
Abstract:
A system and method for cropping a license plate image to facilitate license plate recognition by obtaining an image that includes the license plate image, dividing the image into multiple sub-blocks, computing an activity measure for each sub-block; determining an activity threshold, determining that a sub-block is an active sub-block by comparing the activity measure for the sub-block with the activity threshold, generating a second image of the license plate information, where the second image includes the active sub-block, and obtaining the license plate information based on the second image.
Abstract:
Methods and systems for continuously monitoring the gaze direction of a driver of a vehicle over time. Video is received, which is captured by a camera associated with, for example, a mobile device within a vehicle, the camera and/or mobile device mounted facing the driver of the vehicle. Frames can then be extracted from the video. A facial region can then be detected, which corresponds to the face of the driver within the extracted frames. Features descriptors can then be computed from the facial region. A gaze classifier derived from the vehicle, the driver, and the camera can then be applied, wherein the gaze classifier receives the feature descriptors as inputs and outputs at least one label corresponding to one or more predefined finite number of gaze classes to identify the gaze direction of the driver of the vehicle.
Abstract:
A system captures or otherwise receives a video and uses the video to create an electronic file corresponding to a multi-faceted printed artifact, such as a multi-page document. When the system receives the video, it selects a set of some or all of the video's image frames, determines a frame quality for each frame in the set, and identifies a subset of the frames such that the frame quality of each frame in the subset satisfies one or more image quality criteria. The subset will include at least one frame for each facet of the multi-faceted printed artifact, such as a page of the document. The processor then automatically combines the subset of frames into a single electronic file.
Abstract:
A method of labeling training data includes inputting a plurality of unlabeled input data samples into each of a plurality of pre-trained neural networks and extracting a set of feature embeddings from multiple layer depths of each of the plurality of pre-trained neural networks. The method also includes generating a plurality of clusterings from the set of feature embeddings. The method also includes analyzing, by a processing device, the plurality of clusterings to identify a subset of the plurality of unlabeled input data samples that belong to a same unknown class. The method also includes assigning pseudo-labels to the subset of the plurality of unlabeled input data samples.
Abstract:
A method includes defining a model for a liquid while the liquid is positioned at least partially within a nozzle of a printer. The method also includes synthesizing video frames of the liquid using the model to produce synthetic video frames. The method also includes generating a labeled dataset that includes the synthetic video frames and corresponding model values. The method also includes receiving real video frames of the liquid while the liquid is positioned at least partially within the nozzle of the printer. The method also includes generating an inverse mapping from the real video frames to predicted model values using the labeled dataset. The method also includes reconstructing the liquid in the real video frames based at least partially upon the predicted model values.
Abstract:
A three-dimensional (3D) printer includes a nozzle and a camera configured to capture a real image or a real video of a liquid metal while the liquid metal is positioned at least partially within the nozzle. The 3D printer also includes a computing system configured to perform operations. The operations include generating a model of the liquid metal positioned at least partially within the nozzle. The operations also include generating a simulated image or a simulated video of the liquid metal positioned at least partially within the nozzle based at least partially upon the model. The operations also include generating a labeled dataset that comprises the simulated image or the simulated video and a first set of parameters. The operations also include reconstructing the liquid metal in the real image or the real video based at least partially upon the labeled dataset.