摘要:
Methods of removing text from digital video or still images are disclosed. An image processing system receives an input image set defininga region of interest (ROI) that contains text. The system determines an input background color for the ROI. The system applies a text infilling function to remove text from the ROI to yield a preliminary output image set. The system may determine a residual corrective signal that corresponds to a measurement of background color error between the input set and the preliminary output set. The system may apply the residual corrective signal to the ROI in the preliminary output set to yield a final output set that does not contain the text. Alternatively, the system may remove background from the ROI of the input set before text infilling, then return background to the ROI after the text infilling.
摘要:
Embodiments described herein provide a system for generating synthetic images with localized editing. During operation, the system obtains a source image and a target image for image synthesis and selects a semantic element from the source image. The semantic element indicates a semantically meaningful part of an object depicted in the source image. The system then determines the style information associated with the source and target images. Subsequently, the system generates a synthetic image by transferring the style of the semantic element from the source image to the target image based on the feature representations. In this way, the system can facilitate localized editing of the target image.
摘要:
Interacting with the driver based on the driver's context can keep help keep the driver alert. The context can be determined determining driver characteristics including the interests and by monitoring the circumstances surrounding the driver, such as the state of the driver using sensors included in the vehicle, the state of the vehicle, and the information about the driver's current locale. The characteristics and the monitored circumstances define the context of driver. Information of interest to the driver is obtained and is used to generate actions that are recommendable to the driver based on the driver's context. The actions are used to keep the driver alert.
摘要:
Embodiments described herein provide a system for generating semantically accurate synthetic images. During operation, the system generates a first synthetic image using a first artificial intelligence (AI) model and presents the first synthetic image in a user interface. The user interface allows a user to identify image units of the first synthetic image that are semantically irregular. The system then obtains semantic information for the semantically irregular image units from the user via the user interface and generates a second synthetic image using a second AI model based on the semantic information. The second synthetic image can be an improved image compared to the first synthetic image.
摘要:
A method of labeling a dataset of input samples for a machine learning task includes selecting a plurality of pre-trained machine learning models that are related to a machine learning task. The method further includes processing a plurality of input data samples through each of the pre-trained models to generate a set of embeddings. The method further includes generating a plurality of clusterings from the set of embeddings. The method further includes analyzing, by a processing device, the plurality of clusterings to extract superclusters. The method further includes assigning pseudo-labels to the input samples based on analysis.
摘要:
Embodiments described herein provide a system for localized contextual video annotation. During operation, the system can segment a video into a plurality of segments based on a segmentation unit and parse a respective segment for generating multiple input modalities for the segment. A respective input modality can indicate a form of content in the segment. The system can then classify the segment into a set of semantic classes based on the input modalities and determine an annotation for the segment based on the set of semantic classes.
摘要:
One embodiment can include a system for providing an image-capturing recommendation. During operation the system receives, from a mobile computing device, one or more images. The one or more images are captured by one or more cameras associated with the mobile computing device. The system analyzes the received images to obtain image-capturing conditions for capturing images of a target within a physical space; determines, based on the obtained image-capturing conditions and a predetermined image-quality requirement, one or more image-capturing settings; and recommends the determined one or more image-capturing settings to a user.
摘要:
A method operates a three-dimensional (3D) metal object manufacturing system to compensate for displacement errors that occur during object formation. In the method, image data of a metal object being formed by the 3D metal object manufacturing system is generated prior to completion of the metal object and compared to original 3D object design data of the object to identify one or more displacement errors. For the displacement errors outside a predetermined difference range, the method modifies machine-ready instructions for forming metal object layers not yet formed to compensate for the identified displacement errors and operates the 3D metal object manufacturing system using the modified machine-ready instructions.
摘要:
A method for curvilinear object segmentation includes receiving at least one input image comprising curvilinear features. The at least one image is mapped, using a processor, to output segmentation maps using a deep network having a representation module and a task module. The mapping includes transforming the input image in the representation module using learnable filters trained to suppress noise in one or more of a domain and a task of the at least one input image. The segmentation maps are produced using the transformed input image in the task module.
摘要:
A method for curvilinear object segmentation includes receiving at least one input image comprising curvilinear features. The at least one input image is mapped to segmentation maps of the curvilinear features using a deep network having a representation module and a task module. The mapping includes transforming the input image in the representation module using learnable filters configured to balance recognition of curvilinear geometry with reduction of training error. The segmentation maps are produced using the transformed input image in the task module.