Systems and methods for multimodal multilabel tagging of video

    公开(公告)号:US10965999B2

    公开(公告)日:2021-03-30

    申请号:US16806544

    申请日:2020-03-02

    Applicant: Oath Inc.

    Abstract: Multimodal multilabel tagging of video content may include labeling the video content with topical tags that are identified based on extracted features from two or more modalities of the video content. The two or more modalities may include (i) a video modality for the object, images, and/or visual elements of the video content, (ii) a text modality for the speech, dialog, and/or text of the video content, and/or (iii) an audio modality for non-speech sounds and/or sound characteristics of the video content. Combinational multimodal multilabel tagging may include combining two or more features from the same or different modality in order to increase the contextual understanding of the features and generate contextually relevant tags. Video content may be labeled with global tags relating to overall topics of the video content, and different sets of local tags relating to topics at different segments of the video content.

    SYSTEM AND METHOD FOR LEARNING SCENE EMBEDDINGS VIA VISUAL SEMANTICS AND APPLICATION THEREOF

    公开(公告)号:US20200097764A1

    公开(公告)日:2020-03-26

    申请号:US16142155

    申请日:2018-09-26

    Applicant: OATH INC.

    Abstract: The present teaching relates to method, system, and programming for responding to an image related query. Information related to each of a plurality of images is received, wherein the information represents concepts co-existing in the image. Visual semantics for each of the plurality of images are created based on the information related thereto. Representations of scenes of the plurality of images are obtained via machine learning, based on the visual semantics of the plurality of images, wherein the representations capture concepts associated with the scenes.

    Pointer activity as an indicator of interestingness in video

    公开(公告)号:US10560742B2

    公开(公告)日:2020-02-11

    申请号:US15008711

    申请日:2016-01-28

    Applicant: Oath Inc.

    Abstract: A method is provided, that initiates with providing a video over a network to a plurality of client devices, wherein each client device is configured to render the video and track movements of a pointer during the rendering of the video. Movement data that is indicative of the tracked movements of the pointer is received over the network from each client device. The movement data from the plurality of client devices is processed to determine aggregate pointer movement versus elapsed time of the video. The aggregate pointer movement is analyzed to identify a region of interest of the video. A preview of the video is generated based on the identified region of interest.

    Systems and Methods for Multimodal Multilabel Tagging of Video

    公开(公告)号:US20200084519A1

    公开(公告)日:2020-03-12

    申请号:US16124840

    申请日:2018-09-07

    Applicant: Oath Inc.

    Abstract: Multimodal multilabel tagging of video content may include labeling the video content with topical tags that are identified based on extracted features from two or more modalities of the video content. The two or more modalities may include (i) a video modality for the object, images, and/or visual elements of the video content, (ii) a text modality for the speech, dialog, and/or text of the video content, and/or (iii) an audio modality for non-speech sounds and/or sound characteristics of the video content. Combinational multimodal multilabel tagging may include combining two or more features from the same or different modality in order to increase the contextual understanding of the features and generate contextually relevant tags. Video content may be labeled with global tags relating to overall topics of the video content, and different sets of local tags relating to topics at different segments of the video content.

Patent Agency Ranking