-
公开(公告)号:US20250113087A1
公开(公告)日:2025-04-03
申请号:US18395356
申请日:2023-12-22
Applicant: Lemon Inc.
Inventor: Ju He , Qihang Yu , Inkyu Shin , Xueqing Deng , Xiaohui Shen , Liang-Chieh Chen
IPC: H04N21/845 , H04N21/44
Abstract: The present disclosure describes techniques for implementing video segmentation. A video is divided into a plurality of clips. Each of the plurality of clips comprises several frames. Axial-trajectory attention is applied to each of the plurality of clips by a first sub-model. Clip features corresponding to each of the plurality of clips are generated by the first sub-model. A set of object queries corresponding to each of the plurality of clips is generated based on the clip features by a transformer decoder. Trajectory attention is applied to refine sets of object queries corresponding to the plurality of clips by a second sub-model. Video-level segmentation results are generated based on the refined object queries.
-
公开(公告)号:US20250045929A1
公开(公告)日:2025-02-06
申请号:US18365060
申请日:2023-08-03
Applicant: Lemon Inc.
Inventor: Qihang Yu , Ju He , Xueqing Deng , Xiaohui Shen , Liang-Chieh Chen
IPC: G06T7/12 , G06T3/40 , G06V10/44 , G06V10/764 , G06V10/771
Abstract: Single-stage frameworks for open-vocabulary panoptic segmentation are provided. One aspect provides a computing system comprising a processor and memory storing instructions that, when executed by the processor, cause the processor to: receive an image; extract a plurality of feature maps from the image using a convolutional neural network-based vision-language model; generate a plurality of pixel features from the plurality of feature maps; generate a plurality of mask predictions from the plurality of pixel features; generate a plurality of in-vocabulary class predictions corresponding to the plurality of mask predictions using the plurality of pixel features; generate a plurality of out-of-vocabulary class predictions using the plurality of feature maps; perform geometric ensembling on the plurality of in-vocabulary class predictions and the plurality of out-of-vocabulary class predictions to generate a plurality of final class predictions; and output the plurality of mask predictions and the plurality of final class predictions.
-