摘要:
A crowd motion summarization method that provides a rich, real-time description of the crowd's characteristics from a video, such as, speed, orientation, count, spatial locations, and time. A feature tracking module receives each video frame and detects features (feature points) from the video frame. A crowd occupancy detection module receives the video frame and generates a binary crowd occupancy map having human pixel positions which indicate the human location versus non-human location, and generates a total human count of humans detected in the video frame. The feature tracking module generates feature tracking information for only those features contained in the human pixel positions which indicate the human location. In an example, the detected features are Kanade-Lucas-Tomasi (KLT) features. A feature-crowd matching module generates, using the feature tracking information and the total human count: crowd motion data. The method outputs the crowd motion data.
摘要:
Methods and systems for high-resolution image inpainting are disclosed. An original high-resolution image to be inpainted is obtained, as well as an inpainting mask indicating an inside-mask area to be inpainted. The original high-resolution image is down-sampled to obtain a low-resolution image to be inpainted. Using a trained inpainting generator, a low-resolution inpainted image and a set of attention scores are generated from the low-resolution image. The attention scores represent the similarity between inside-mask regions and outside-mask regions. A high-frequency residual image is computed from the original high-resolution image. An aggregated high-frequency residual image is generated using the attention scores, including high-frequency residual information for the inside-mask area. A high-resolution inpainted image is outputted by combining the aggregated high-frequency residual image and a low-frequency inpainted image generated from the low-resolution inpainted image.
摘要:
A crowd motion summarization method that provides a rich, real-time description of the crowd's characteristics from a video, such as, speed, orientation, count, spatial locations, and time. A feature tracking module receives each video frame and detects features (feature points) from the video frame. A crowd occupancy detection module receives the video frame and generates a binary crowd occupancy map having human pixel positions which indicate the human location versus non-human location, and generates a total human count of humans detected in the video frame. The feature tracking module generates feature tracking information for only those features contained in the human pixel positions which indicate the human location. In an example, the detected features are Kanade-Lucas-Tomasi (KLT) features. A feature-crowd matching module generates, using the feature tracking information and the total human count: crowd motion data. The method outputs the crowd motion data.
摘要:
A method and apparatus for encoding a frame from a mixed content image sequence. In one embodiment, the method, executed under the control of a processor configured with computer executable instructions, comprises (i) generating, by an encoding processor, an image type mask that divides the frame into an unchanged portion, an object portion and a picture portion; (ii) producing lossless encoded content, by the encoding processor, from the object portion and the image type mask; (iii) generating, by the encoding processor, a filtered facsimile from the frame, the filtered facsimile generated by retaining the picture portion and filling the unchanged portion and the object portion with neutral image data; and (iv) producing, by the encoding processor, lossy encoded content from the filtered facsimile.
摘要:
Methods and systems for high-resolution image manipulation are disclosed. An original high-resolution image to be manipulated is obtained, as well as a driving signal indicating a manipulation result. The original high-resolution image is down-sampled to obtain a low-resolution image to be manipulated. Using a trained manipulation generator, a low-resolution manipulated image and a motion field are generated from the low-resolution image. The motion field represent pixel displacements of the low-resolution image to obtain the manipulation indicated by the driving signal. A high-frequency residual image is computed from the original high-resolution image. A high-frequency manipulated residual image is generated using the motion field. A high-resolution manipulated image is outputted by combining the high-frequency manipulated residual image and a low-frequency manipulated image generated from the low-resolution manipulated image by up-sampling.
摘要:
Methods and systems for fully-automatic image processing to detect and remove unwanted people from a digital image of a photograph. The system includes the following modules: 1) Deep neural network (DNN)-based module for object segmentation and head pose estimation; 2) classification (or grouping) of wanted versus unwanted people based on information collected in the first module; 3) image inpainting of the unwanted people in the digital image. The classification module can be rules-based in an example. In an example, the DNN-based module generates, from the digital image: 1. A list of object category labels, 2. A list of object scores, 3. A list of binary masks, 4. A list of object bounding boxes, 5. A list of crowd instances, 6. A list of human head bounding boxes, and 7. A list of head poses (e.g., yaws, pitches, and rolls).
摘要:
A method and system for communicating a computer rendered image sequence from a host computer to a remote computer. The method comprises determining, at the host computer, while performing a progressive encoding of an image portion of the computer rendered image sequence, motion of the image portion, wherein the progressive encoding comprises generating a lossy encoding of a frequency transform of the image portion and a first refinement encoding of the frequency transform; generating, at the host computer, a motion vector representing the motion; and communicating, from the host computer to the remote computer, the lossy encoding, the first refinement encoding, and the motion vector.
摘要:
Systems, methods, and computer-readable media for identifying a main group of people in an image via social relation recognition. The main group of people is identified within an image by identifying social relationships between people visible in the image. The identification of social relationships is performed by a Social Relation Recognition Network (SRRN) trained using deep learning. The SRRN combines two techniques for group identification, First Glance and Graph Reasoning, and fuses their outputs to generate a prediction of group membership. A group refinement module improves and filters the group membership after identification of an initial main group.
摘要:
Methods and systems for high-resolution image manipulation are disclosed. An original high-resolution image to be manipulated is obtained, as well as a driving signal indicating a manipulation result. The original high-resolution image is down-sampled to obtain a low-resolution image to be manipulated. Using a trained manipulation generator, a low-resolution manipulated image and a motion field are generated from the low-resolution image. The motion field represent pixel displacements of the low-resolution image to obtain the manipulation indicated by the driving signal. A high-frequency residual image is computed from the original high-resolution image. A high-frequency manipulated residual image is generated using the motion field. A high-resolution manipulated image is outputted by combining the high-frequency manipulated residual image and a low-frequency manipulated image generated from the low-resolution manipulated image by up-sampling.
摘要:
Methods and systems for fully-automatic image processing to detect and remove unwanted people from a digital image of a photograph. The system includes the following modules: 1) Deep neural network (DNN)-based module for object segmentation and head pose estimation; 2) classification (or grouping) of wanted versus unwanted people based on information collected in the first module; 3) image inpainting of the unwanted people in the digital image. The classification module can be rules-based in an example. In an example, the DNN-based module generates, from the digital image: 1. A list of object category labels, 2. A list of object scores, 3. A list of binary masks, 4. A list of object bounding boxes, 5. A list of crowd instances, 6. A list of human head bounding boxes, and 7. A list of head poses (e.g., yaws, pitches, and rolls).