MULTIMODAL DATA PROCESSING
    1.
    发明申请

    公开(公告)号:US20230010160A1

    公开(公告)日:2023-01-12

    申请号:US17945415

    申请日:2022-09-15

    Abstract: Disclosed are a method for processing multimodal data using a neural network, a device, and a medium, and relates to the field of artificial intelligence and, in particular to multimodal data processing, video classification, and deep learning. The neural network includes: an input subnetwork configured to receive the multimodal data to output respective first features of a plurality of modalities; a plurality of cross-modal feature subnetworks, each of which is configured to receive respective first features of two corresponding modalities to output a cross-modal feature corresponding to the two modalities; a plurality of cross-modal fusion subnetworks, each of which is configured to receive at least one cross-modal feature corresponding to a corresponding target modality and other modalities to output a second feature of the target modality; and an output subnetwork configured to receive respective second features of the plurality of modalities to output a processing result of the multimodal data.

    METHOD OF PROCESSING MULTIMEDIA DATA, DEVICE AND MEDIUM

    公开(公告)号:US20230115737A1

    公开(公告)日:2023-04-13

    申请号:US18080432

    申请日:2022-12-13

    Abstract: A method of processing multimedia data, a device, and a medium, which relates to a field of an artificial intelligence technology, in particular to fields of knowledge graph and deep learning. The method of processing the multimedia data includes: recognizing the multimedia data so as to obtain at least one key information of the multimedia data; querying a predetermined knowledge base according to the at least one key information, so as to determine a multimedia name associated with the at least one key information and an association degree between the multimedia name and the at least one key information; and determining, in the multimedia name, a name of the multimedia data based on a similarity between alternative multimedia data for the multimedia name and the multimedia data, in response to the association degree being less than a first threshold value.

    METHOD FOR TRAINING IMAGE-TEXT MATCHING MODEL, COMPUTING DEVICE, AND STORAGE MEDIUM

    公开(公告)号:US20230005284A1

    公开(公告)日:2023-01-05

    申请号:US17943458

    申请日:2022-09-13

    Abstract: A computer-implemented method is provided. The method includes: obtaining a sample text and a sample image corresponding to the sample text; labeling a true semantic tag for the sample text according to a first preset rule; obtaining a text feature representation of the sample text and a predicted semantic tag output by a text coding sub-model; obtaining an image feature representation of the sample image output by an image coding sub-model; calculating a first loss based on the true semantic tag and the predicted semantic tag; calculating a contrast loss based on the text feature representation of the sample text and the image feature representation of the sample image; adjusting parameters of the text coding sub-model based on the first loss and the contrast loss; and adjusting parameters of the image coding sub-model based on the contrast loss.

Patent Agency Ranking