摘要:
A tracker component for a computer vision engine of a machine-learning based behavior-recognition system is disclosed. The behavior-recognition system may be configured to learn, identify, and recognize patterns of behavior by observing a video stream (i.e., a sequence of individual video frames). The tracker component may be configured to track objects depicted in the sequence of video frames and to generate, search, match, and update computational models of such objects.
摘要:
A machine-learning engine is disclosed that is configured to recognize and learn behaviors, as well as to identify and distinguish between normal and abnormal behavior within a scene, by analyzing movements and/or activities (or absence of such) over time. The machine-learning engine may be configured to evaluate a sequence of primitive events and associated kinematic data generated for an object depicted in a sequence of video frames and a related vector representation. The vector representation is generated from a primitive event symbol stream and a phase space symbol stream, and the streams describe actions of the objects depicted in the sequence of video frames.
摘要:
A machine-learning engine is disclosed that is configured to recognize and learn behaviors, as well as to identify and distinguish between normal and abnormal behavior within a scene, by analyzing movements and/or activities (or absence of such) over time. The machine-learning engine may be configured to evaluate a sequence of primitive events and associated kinematic data generated for an object depicted in a sequence of video frames and a related vector representation. The vector representation is generated from a primitive event symbol stream and a phase space symbol stream, and the streams describe actions of the objects depicted in the sequence of video frames.
摘要:
Embodiments of the present invention provide a method and a system for analyzing and learning behavior based on an acquired stream of video frames. Objects depicted in the stream are determined based on an analysis of the video frames. Each object may have a corresponding search model used to track an object's motion frame-to-frame. Classes of the objects are determined and semantic representations of the objects are generated. The semantic representations are used to determine objects' behaviors and to learn about behaviors occurring in an environment depicted by the acquired video streams. This way, the system learns rapidly and in real-time normal and abnormal behaviors for any environment by analyzing movements or activities or absence of such in the environment and identifies and predicts abnormal and suspicious behavior based on what has been learned.
摘要:
Embodiments of the present invention provide a method and a system for mapping a scene depicted in an acquired stream of video frames that may be used by a machine-learning behavior-recognition system. A background image of the scene is segmented into plurality of regions representing various objects of the background image. Statistically similar regions may be merged and associated. The regions are analyzed to determine their z-depth order in relation to a video capturing device providing the stream of the video frames and other regions, using occlusions between the regions and data about foreground objects in the scene. An annotated map describing the identified regions and their properties is created and updated.
摘要:
A long-term memory used to store and retrieve information learned while a video analysis system observes a stream of video frames is disclosed. The long-term memory provides a memory with a capacity that grows in size gracefully, as events are observed over time. Additionally, the long-term memory may encode events, represented by sub-graphs of a neural network. Further, rather than predefining a number of patterns recognized and manipulated by the long-term memory, embodiments of the invention provide a long-term memory where the size of a feature dimension (used to determine the similarity between different observed events) may grow dynamically as necessary, depending on the actual events observed in a sequence of video frames.