TEXT-BASED FRAMEWORK FOR VIDEO OBJECT SELECTION

    公开(公告)号:US20230162502A1

    公开(公告)日:2023-05-25

    申请号:US17531568

    申请日:2021-11-19

    Applicant: Adobe Inc.

    Abstract: Embodiments are disclosed for receiving a user input and an input video comprising multiple frames. The method may include extracting a text feature from the user input. The method may further include extracting a plurality of image features from the frames. The method may further include identifying one or more keyframes from the frames that include the object. The method may further include clustering one or more groups of the one or more keyframes. The method may further include generating a plurality of segmentation masks for each group. The method may further include determining a set of reference masks corresponding to the user input and the object. The method may further include generating a set of fusion masks by combining the plurality of segmentation masks and the set of reference masks. The method may further include propagating the set of fusion masks and outputting a final set of masks.

Patent Agency Ranking