Learning representations of generalized cross-modal entailment tasks
Abstract:
A method is provided for determining entailment between an input premise and an input hypothesis of different modalities. The method includes extracting features from the input hypothesis and an entirety of and regions of interest in the input premise. The method further includes deriving intra-modal relevant information while suppressing intra-modal irrelevant information, based on intra-modal interactions between elementary ones of the features of the input hypothesis and between elementary ones of the features of the input premise. The method also includes attaching cross-modal relevant information to the features from the input premise to the features from the input hypothesis to form a cross-modal representation, based on cross-modal interactions between pairs of different elementary features from different modalities. The method additionally includes classifying a relationship between the input premise and the input hypothesis using a label selected from the group consisting of entailment, neutral, and contradiction based on the cross-modal representation.
Information query
Patent Agency Ranking
0/0