Patent search ap:("Lemon Inc.") AND inv:"Liang-Chieh Chen" Page 1

1.

发明申请
IMPLEMENTING VIDEO SEGMENTATION 有权

公开(公告)号：US20250113087A1

公开(公告)日：2025-04-03

申请号：US18395356

申请日：2023-12-22

Applicant: Lemon Inc.

Inventor： Ju He , Qihang Yu , Inkyu Shin , Xueqing Deng , Xiaohui Shen , Liang-Chieh Chen

IPC: H04N21/845 , H04N21/44

Abstract: The present disclosure describes techniques for implementing video segmentation. A video is divided into a plurality of clips. Each of the plurality of clips comprises several frames. Axial-trajectory attention is applied to each of the plurality of clips by a first sub-model. Clip features corresponding to each of the plurality of clips are generated by the first sub-model. A set of object queries corresponding to each of the plurality of clips is generated based on the clip features by a transformer decoder. Trajectory attention is applied to refine sets of object queries corresponding to the plurality of clips by a second sub-model. Video-level segmentation results are generated based on the refined object queries.

2.

发明申请
SINGLE-STAGE OPEN-VOCABULARY PANOPTIC SEGMENTATION 有权

公开(公告)号：US20250045929A1

公开(公告)日：2025-02-06

申请号：US18365060

申请日：2023-08-03

Applicant: Lemon Inc.

Inventor： Qihang Yu , Ju He , Xueqing Deng , Xiaohui Shen , Liang-Chieh Chen

IPC: G06T7/12 , G06T3/40 , G06V10/44 , G06V10/764 , G06V10/771

Abstract: Single-stage frameworks for open-vocabulary panoptic segmentation are provided. One aspect provides a computing system comprising a processor and memory storing instructions that, when executed by the processor, cause the processor to: receive an image; extract a plurality of feature maps from the image using a convolutional neural network-based vision-language model; generate a plurality of pixel features from the plurality of feature maps; generate a plurality of mask predictions from the plurality of pixel features; generate a plurality of in-vocabulary class predictions corresponding to the plurality of mask predictions using the plurality of pixel features; generate a plurality of out-of-vocabulary class predictions using the plurality of feature maps; perform geometric ensembling on the plurality of in-vocabulary class predictions and the plurality of out-of-vocabulary class predictions to generate a plurality of final class predictions; and output the plurality of mask predictions and the plurality of final class predictions.

Patent Agency Ranking