摘要:
Keyframes of video are arranged on a display based on characteristics on the keyframes, such as content similarity and temporal relation as compared to each other, where input is received comprising one or more keyframes from video data and it is determined where to display the one or more keyframes along a first axis of the display based on a time associated with the keyframe or keyframes. It is determined where to display the one or more keyframes along a second axis based on the content of the keyframe or keyframes.
摘要:
Keyframes of video are arranged on a display based on characteristics on the keyframes, such as content similarity and temporal relation as compared to each other, where input is received comprising one or more keyframes from video data and it is determined where to display the one or more keyframes along a first axis of the display based on a time associated with the keyframe or keyframes. It is determined where to display the one or more keyframes along a second axis based on the content of the keyframe or keyframes.
摘要:
Caption boxes which are embedded in video content can be located and the text within the caption boxes decoded. Real time processing is enhanced by locating caption box regions in the compressed video domain and performing pixel based processing operations within the region of the video frame in which a caption box is located. The captions boxes are further refined by identifying word regions within the caption boxes and then applying character and word recognition processing to the identified word regions. Domain based models are used to improve text recognition results. The extracted caption box text can be used to detect events of interest in the video content and a semantic model applied to extract a segment of video of the event of interest.
摘要:
This application provides a video retrieval method performed by a computer device. The method includes: performing feature extraction on an image feature of a query video to obtain a first quantization feature, obtaining a second candidate video with a high category similarity to the query video based on the first quantization feature, and finally taking a second candidate video with a high content similarity to the query video as a target video. The quantization control parameters are adjusted according to the texture feature loss value corresponding to each training sample to cause the target quantization processing sub-model to learn the ranking ability of the target texture feature sub-model, to ensure that the ranking effect of two sub-models tend to be consistent, and an end-to-end model architecture enables the target quantization processing sub-model to obtain the corresponding quantization feature based on the image feature.
摘要:
An embodiment of the present invention relates to the combining of multiple semantic analyses of audio-visual data in order to resolve a higher fidelity description of the semantic content and more specifically to a method for applying semantic concept detection over multiple related audio-video sources, scoring the sources on the basis of presence or absence of specific semantics and aggregating the scores using combination functions to achieve a semantic super-resolution.
摘要:
An apparatus for retrieving a video picture includes a decoder section for decoding a coded bit stream of video picture data representing an arbitrary shape object and including shape information and texture information, a retrieval condition input section for inputting a retrieval condition for retrieval of a desired picture, a retrieval section for retrieving a picture meeting the retrieval condition by using shape information of the object decoded by the decoder section, and a display section for outputting the retrieved result obtained by the retrieval section.
摘要:
Techniques for poster-thumbnail and/or animated thumbnail development and/or usage to effectively navigate for potential selection between a plurality of images or programs/video files or video segments. The poster and animated thumbnail images are presented in a GUI on adapted apparatus to provide an efficient system for navigating, browsing and/or selecting images or programs or video segments to be viewed by a user. The poster and animated thumbnails may be automatically produced without human-necessary editing and may also have one or more various associated data (such as text overlay, image overlay, cropping, text or image deletion or replacement, and/or associated audio).
摘要:
Caption boxes which are embedded in video content can be located and the text within the caption boxes decoded. Real time processing is enhanced by locating caption box regions in the compressed video domain (210) and performing pixel based processing operations within the region of the video frame in which a caption box is located. The captions boxes are further refined by identifying word regions (240) within the caption boxes and then applying character and word recognition processing (250) to the identified word regions. Domain based models are used to improve text recognition results. The extracted caption box text can be used to detect events of interest in the video content and a semantic model applied to extract a segment of video of the event of interest.