摘要:
A human tracking apparatus and method capable of highly accurately tracking the movement of persons photographed in moving images includes: an image memory 107 that stores an inputted frame image; a human detecting unit 101 that detects persons photographed in the inputted frame image; a candidate registering unit 106 that registers already detected persons as candidates; a similarity index calculating unit 102 that calculates similarity indices indicating the similarity between the persons detected in the inputted frame image and the registered candidates for two or more types of parameters based on the stored frame images in relation to all combinations of the persons and the candidates; a normalizing unit 103 that normalizes the similarity indices; an integrating unit 104 that integrates the normalized indices for each combination of the detected persons and the candidates; and a tracking unit 105 that identifies a person the same as an arbitrary candidate based on the similarity indices.
摘要:
System and methods are disclosed to perform multi-human 3D tracking with a plurality of cameras. At each view, a module receives each camera output and provides 2D human detection candidates. A plurality of 2D tracking modules are connected to the CNNs, each 2D tracking module managing 2D tracking independently. A 3D tracking module is connected to the 2D tracking modules to receive promising 2D tracking hypotheses. The 3D tracking module selects trajectories from the 2D tracking modules to generate 3D tracking hypotheses.
摘要:
A method and system for training a neural network of a visual recognition computer system, extracts at least one feature of an image or video frame with a feature extractor; approximates the at least one feature of the image or video frame with an auxiliary output provided in the neural network; and measures a feature difference between the extracted at least one feature of the image or video frame and the approximated at least one feature of the image or video frame with an auxiliary error calculator. A joint learner of the method and system adjusts at least one parameter of the neural network to minimize the measured feature difference.
摘要:
A video super-resolution method that combines information from different spatial-temporal resolution cameras by constructing a personalized dictionary from a high resolution image of a scene resulting in a domain specific prior that performs better than a general dictionary built from images.
摘要:
Systems and methods create high quality audio-centric, image-centric, and integrated audio-visual summaries by seamlessly integrating image, audio, and text features extracted from input video. Integrated summarization may be employed when strict synchronization of audio and image content is not required. Video programming which requires synchronization of the audio content and the image content may be summarized using either an audio-centric or an image-centric approach. Both a machine learning-based approach and an alternative, heuristics-based approach are disclosed. Numerous probabilistic methods may be employed with the machine learning-based learning approach, such as naïve Bayes, decision tree, neural networks, and maximum entropy. To create an integrated audio-visual summary using the alternative, heuristics-based approach, a maximum-bipartite-matching approach is disclosed by way of example.
摘要:
A video surveillance system uses rule-based reasoning and multiple-hypothesis scoring to detect predefined behaviors based on movement through zone patterns. Trajectory hypothesis spawning allows for trajectory splitting and/or merging and includes local pruning to managed hypothesis growth. Hypotheses are scored based on a number of criteria, illustratively including at least one non-spatial parameter. Connection probabilities computed during the hypothesis spawning process are based on a number of criteria, illustratively including object size. Object detection and probability scoring is illustratively based on object class.
摘要:
A video surveillance system uses rule-based reasoning and multiple-hypothesis scoring to detect predefined behaviors based on movement through zone patterns. Trajectory hypothesis spawning allows for trajectory splitting and/or merging and includes local pruning to managed hypothesis growth. Hypotheses are scored based on a number of criteria, illustratively including at least one non-spatial parameter. Connection probabilities computed during the hypothesis spawning process are based on a number of criteria, illustratively including object size. Object detection and probability scoring is illustratively based on object class.
摘要:
A video surveillance system uses rule-based reasoning and multiple-hypothesis scoring to detect predefined behaviors based on movement through zone patterns. Trajectory hypothesis spawning allows for trajectory splitting and/or merging and includes local pruning to managed hypothesis growth. Hypotheses are scored based on a number of criteria, illustratively including at least one non-spatial parameter. Connection probabilities computed during the hypothesis spawning process are based on a number of criteria, illustratively including object size. Object detection and probability scoring is illustratively based on object class.
摘要:
An automatic video content summarization system that is able to create personalized multimedia summary based on the user-specified theme. The invention employs both natural language processing and video analysis techniques to extract important keywords from the closed caption text as well as prominent visual features from the video footage. The invention uses a Bayesian statistical framework that naturally integrates the user theme, the heuristics and the theme-relevant video characteristics within a unified platform.