IMPLEMENTING VIDEO SEGMENTATION
    1.
    发明申请

    公开(公告)号:US20250113087A1

    公开(公告)日:2025-04-03

    申请号:US18395356

    申请日:2023-12-22

    Applicant: Lemon Inc.

    Abstract: The present disclosure describes techniques for implementing video segmentation. A video is divided into a plurality of clips. Each of the plurality of clips comprises several frames. Axial-trajectory attention is applied to each of the plurality of clips by a first sub-model. Clip features corresponding to each of the plurality of clips are generated by the first sub-model. A set of object queries corresponding to each of the plurality of clips is generated based on the clip features by a transformer decoder. Trajectory attention is applied to refine sets of object queries corresponding to the plurality of clips by a second sub-model. Video-level segmentation results are generated based on the refined object queries.

    SINGLE-STAGE OPEN-VOCABULARY PANOPTIC SEGMENTATION

    公开(公告)号:US20250045929A1

    公开(公告)日:2025-02-06

    申请号:US18365060

    申请日:2023-08-03

    Applicant: Lemon Inc.

    Abstract: Single-stage frameworks for open-vocabulary panoptic segmentation are provided. One aspect provides a computing system comprising a processor and memory storing instructions that, when executed by the processor, cause the processor to: receive an image; extract a plurality of feature maps from the image using a convolutional neural network-based vision-language model; generate a plurality of pixel features from the plurality of feature maps; generate a plurality of mask predictions from the plurality of pixel features; generate a plurality of in-vocabulary class predictions corresponding to the plurality of mask predictions using the plurality of pixel features; generate a plurality of out-of-vocabulary class predictions using the plurality of feature maps; perform geometric ensembling on the plurality of in-vocabulary class predictions and the plurality of out-of-vocabulary class predictions to generate a plurality of final class predictions; and output the plurality of mask predictions and the plurality of final class predictions.

Patent Agency Ranking