Invention Publication
- Patent Title: MULTIMODAL DATA PROCESSING
-
Application No.: US18393238Application Date: 2023-12-21
-
Publication No.: US20240144664A1Publication Date: 2024-05-02
- Inventor: Song Bai , Rui Yan , Heng Wang , Junhao Zhang , Chuhui Xue , Wenqing Zhang
- Applicant: Lemon Inc. , Beijing Youzhuju Network Technology Co., Ltd.
- Applicant Address: KY Grand Cayman
- Assignee: Lemon Inc.,Beijing Youzhuju Network Technology Co., Ltd.
- Current Assignee: Lemon Inc.,Beijing Youzhuju Network Technology Co., Ltd.
- Current Assignee Address: KY Grand Cayman
- Priority: CN 23100097400 2023.01.04
- Main IPC: G06V10/82
- IPC: G06V10/82 ; G06V10/46

Abstract:
Embodiments of the present disclosure provide a solution for multimodal data processing. A method comprises: obtaining image data and text data; and extracting a target visual feature of image data and a target textual feature of text data using a feature extraction model. The feature extraction model comprises alternatively deployed cross-modal encoding parts and visual encoding parts. The extracting comprises: performing, using a first cross-modal encoding part of the feature extraction model, cross-modal feature encoding on a first intermediate visual feature of the image data and a first intermediate textual feature of the text data, to obtain a second intermediate visual feature and a second intermediate textual feature; performing, using a first visual encoding part of the feature extraction model, visual modal feature encoding on the second intermediate visual feature, to obtain a third intermediate visual feature.
Public/Granted literature
- US2134375A Package Public/Granted day:1938-10-25
Information query