Multimodal fine-grained mixing method and system, device, and storage medium
摘要:
The present disclosure provides a multimodal fine-grained mixing method and system, a device, and a storage medium. The method includes: extracting data features from multimodal graphic and textual data, and obtaining each composition of the data features, the data features including a visual regional feature and a text word feature; performing fine-grained classification on modal information of each composition of the data features, to obtain classification results; and performing inter-modal and intra-modal information fusion on each composition according to the classification results, to obtain a fusion feature. The method enables a multimodal model to utilize a complementary characteristic of the multimodal data, with no influence by irrelevant information.
信息查询
0/0