ADAPTIVE CYCLE CONSISTENCY MULTIMODAL IMAGE CAPTIONING

    公开(公告)号:US20220012919A1

    公开(公告)日:2022-01-13

    申请号:US16923142

    申请日:2020-07-08

    IPC分类号: G06T9/00 G06N3/08 G06N3/04

    摘要: In an approach to improving the image captioning performance of low-resource languages by leveraging multimodal inputs, one or more computer processors encode an image utilizing an image encoder, wherein the image is contained within a triplet comprising the image, one or more high-resource captions, and one or more low-resource captions. The one or more computer processors generate one or more high-resource captions utilizing the encoded image and the triplet inputted into a high-resource decoder. The one or more computer processors encode the one or more generated high-resource captions utilizing a high-resource encoder. The one or more computer processors add adaptive cycle consistency constraints on a set of calculated attention weights associated the triplet. The one or more computer processors generate one or more low-resource captions by simultaneously inputting the encoded image, the encoded high-resource caption, and the triplet into a trained low-resource decoder.

    Mixup image captioning
    3.
    发明授权

    公开(公告)号:US11334769B2

    公开(公告)日:2022-05-17

    申请号:US16922367

    申请日:2020-07-07

    IPC分类号: G06K9/62 G06F16/55 G06N3/08

    摘要: In an approach to augmenting caption datasets, one or more computer processors sample a ratio lambda from a probability distribution based on a pair of datapoints contained in a dataset, wherein each datapoint in the pair of datapoints comprises an image and an associated caption; extend the dataset by generating one or more new datapoints based on the sampled ratio lambda for each pair of datapoints in the dataset, wherein the sampled ratio lambda incorporates an interpolation of features associated with the pair of datapoints into the generated one or more new datapoints; identify one or more objects contained within a subsequent image utilizing an image model trained utilizing the extended dataset; generate a subsequent caption for one or more identified objects contained within the subsequent image utilizing a language generating model trained utilizing the extended dataset.

    MIXUP IMAGE CAPTIONING
    4.
    发明申请

    公开(公告)号:US20220012544A1

    公开(公告)日:2022-01-13

    申请号:US16922367

    申请日:2020-07-07

    IPC分类号: G06K9/62 G06F16/55 G06N3/08

    摘要: In an approach to augmenting caption datasets, one or more computer processors sample a ratio lambda from a probability distribution based on a pair of datapoints contained in a dataset, wherein each datapoint in the pair of datapoints comprises an image and an associated caption; extend the dataset by generating one or more new datapoints based on the sampled ratio lambda for each pair of datapoints in the dataset, wherein the sampled ratio lambda incorporates an interpolation of features associated with the pair of datapoints into the generated one or more new datapoints; identify one or more objects contained within a subsequent image utilizing an image model trained utilizing the extended dataset; generate a subsequent caption for one or more identified objects contained within the subsequent image utilizing a language generating model trained utilizing the extended dataset.

    Adaptive cycle consistency multimodal image captioning

    公开(公告)号:US11651522B2

    公开(公告)日:2023-05-16

    申请号:US16923142

    申请日:2020-07-08

    IPC分类号: G06T9/00 G06N3/08 G06N3/045

    CPC分类号: G06T9/002 G06N3/045 G06N3/08

    摘要: In an approach to improving the image captioning performance of low-resource languages by leveraging multimodal inputs, one or more computer processors encode an image utilizing an image encoder, wherein the image is contained within a triplet comprising the image, one or more high-resource captions, and one or more low-resource captions. The one or more computer processors generate one or more high-resource captions utilizing the encoded image and the triplet inputted into a high-resource decoder. The one or more computer processors encode the one or more generated high-resource captions utilizing a high-resource encoder. The one or more computer processors add adaptive cycle consistency constraints on a set of calculated attention weights associated the triplet. The one or more computer processors generate one or more low-resource captions by simultaneously inputting the encoded image, the encoded high-resource caption, and the triplet into a trained low-resource decoder.

    DENOISING AUTOENCODER IMAGE CAPTIONING

    公开(公告)号:US20220012534A1

    公开(公告)日:2022-01-13

    申请号:US16922155

    申请日:2020-07-07

    IPC分类号: G06K9/62 G06N20/00 G06K9/40

    摘要: In an approach to augmenting a caption dataset by leveraging a denoising autoencoder to sample and generate additional captions from the ground truth captions, one or more computer processors generate a plurality of new captions utilizing an autoencoder fed with one or more noisy captions, wherein the autoencoder is trained with a dataset comprising a plurality of ground truth captions. The one or more computer processors calculate an importance weight for each new caption in the plurality of generated new captions as compared to a plurality of associated ground truth captions based on a consensus metric. The one or more computer processors train a caption model with the generated plurality of new captions and associated calculated weights.