Multi-model techniques to generate video metadata
Abstract:
A metadata generation system utilizes machine learning techniques to accurately describe content of videos based on multi-model predictions. In some embodiments, multiple feature sets are extracted from a video, including feature sets showing correlations between additional features of the video. The feature sets are provided to a learnable pooling layer with multiple modeling techniques, which generates, for each of the feature sets, a multi-model content prediction. In some cases, the multi-model predictions are consolidated into a combined prediction. Keywords describing the content of the video are determined based on the multi-model predictions (or combined prediction). An augmented video is generated with metadata that is based on the keywords.
Public/Granted literature
Information query
Patent Agency Ranking
0/0