MULTIMODAL DATA PROCESSING

Invention Publication

US20240144664A1 MULTIMODAL DATA PROCESSING 审中-公开

Please log in to see more content

Patent Title: MULTIMODAL DATA PROCESSING
Application No.: US18393238

Application Date: 2023-12-21
Publication No.: US20240144664A1

Publication Date: 2024-05-02
Inventor: Song Bai , Rui Yan , Heng Wang , Junhao Zhang , Chuhui Xue , Wenqing Zhang
Applicant: Lemon Inc. , Beijing Youzhuju Network Technology Co., Ltd.
Applicant Address: KY Grand Cayman
Assignee: Lemon Inc.,Beijing Youzhuju Network Technology Co., Ltd.
Current Assignee: Lemon Inc.,Beijing Youzhuju Network Technology Co., Ltd.
Current Assignee Address: KY Grand Cayman
Priority: CN 23100097400 2023.01.04
Main IPC: G06V10/82
IPC: G06V10/82 ; G06V10/46

Abstract:

Embodiments of the present disclosure provide a solution for multimodal data processing. A method comprises: obtaining image data and text data; and extracting a target visual feature of image data and a target textual feature of text data using a feature extraction model. The feature extraction model comprises alternatively deployed cross-modal encoding parts and visual encoding parts. The extracting comprises: performing, using a first cross-modal encoding part of the feature extraction model, cross-modal feature encoding on a first intermediate visual feature of the image data and a first intermediate textual feature of the text data, to obtain a second intermediate visual feature and a second intermediate textual feature; performing, using a first visual encoding part of the feature extraction model, visual modal feature encoding on the second intermediate visual feature, to obtain a third intermediate visual feature.

Public/Granted literature

US2134375A Package Public/Granted day:1938-10-25

Information query

Global Dossier Espacenet

IPC分类:

G	物理
G06	计算；推算或计数
G06V	图像或视频识别或理解
G06V10/00	图像或视频识别或理解的安排（图像或视频中的字符识别 G06V30/10）
G06V10/70	.使用模式识别或机器学习（光学模式识别或电子计算 G06V10/88）
G06V10/82	..使用神经网络