-
公开(公告)号:US11763544B2
公开(公告)日:2023-09-19
申请号:US16922155
申请日:2020-07-07
发明人: Shiwan Zhao , Hao Kai Zhang , Yi Ke Wu , Zhong Su
IPC分类号: G06V10/30 , G06N20/00 , G06F18/214 , G06F18/22 , G06F18/21 , G06F18/2113 , G06V10/74 , G06V10/764 , G06V10/771 , G06V10/774 , G06V10/776
CPC分类号: G06V10/30 , G06F18/217 , G06F18/2113 , G06F18/2148 , G06F18/22 , G06N20/00 , G06V10/761 , G06V10/764 , G06V10/771 , G06V10/776 , G06V10/7747
摘要: In an approach to augmenting a caption dataset by leveraging a denoising autoencoder to sample and generate additional captions from the ground truth captions, one or more computer processors generate a plurality of new captions utilizing an autoencoder fed with one or more noisy captions, wherein the autoencoder is trained with a dataset comprising a plurality of ground truth captions. The one or more computer processors calculate an importance weight for each new caption in the plurality of generated new captions as compared to a plurality of associated ground truth captions based on a consensus metric. The one or more computer processors train a caption model with the generated plurality of new captions and associated calculated weights.
-
公开(公告)号:US20220012919A1
公开(公告)日:2022-01-13
申请号:US16923142
申请日:2020-07-08
发明人: Shiwan Zhao , Yi Ke Wu , Hao Kai Zhang , Zhong Su
摘要: In an approach to improving the image captioning performance of low-resource languages by leveraging multimodal inputs, one or more computer processors encode an image utilizing an image encoder, wherein the image is contained within a triplet comprising the image, one or more high-resource captions, and one or more low-resource captions. The one or more computer processors generate one or more high-resource captions utilizing the encoded image and the triplet inputted into a high-resource decoder. The one or more computer processors encode the one or more generated high-resource captions utilizing a high-resource encoder. The one or more computer processors add adaptive cycle consistency constraints on a set of calculated attention weights associated the triplet. The one or more computer processors generate one or more low-resource captions by simultaneously inputting the encoded image, the encoded high-resource caption, and the triplet into a trained low-resource decoder.
-
公开(公告)号:US11334769B2
公开(公告)日:2022-05-17
申请号:US16922367
申请日:2020-07-07
发明人: Shiwan Zhao , Yi Ke Wu , Hao Kai Zhang , Zhong Su
摘要: In an approach to augmenting caption datasets, one or more computer processors sample a ratio lambda from a probability distribution based on a pair of datapoints contained in a dataset, wherein each datapoint in the pair of datapoints comprises an image and an associated caption; extend the dataset by generating one or more new datapoints based on the sampled ratio lambda for each pair of datapoints in the dataset, wherein the sampled ratio lambda incorporates an interpolation of features associated with the pair of datapoints into the generated one or more new datapoints; identify one or more objects contained within a subsequent image utilizing an image model trained utilizing the extended dataset; generate a subsequent caption for one or more identified objects contained within the subsequent image utilizing a language generating model trained utilizing the extended dataset.
-
公开(公告)号:US20220012544A1
公开(公告)日:2022-01-13
申请号:US16922367
申请日:2020-07-07
发明人: Shiwan Zhao , Yi Ke Wu , Hao Kai Zhang , Zhong Su
摘要: In an approach to augmenting caption datasets, one or more computer processors sample a ratio lambda from a probability distribution based on a pair of datapoints contained in a dataset, wherein each datapoint in the pair of datapoints comprises an image and an associated caption; extend the dataset by generating one or more new datapoints based on the sampled ratio lambda for each pair of datapoints in the dataset, wherein the sampled ratio lambda incorporates an interpolation of features associated with the pair of datapoints into the generated one or more new datapoints; identify one or more objects contained within a subsequent image utilizing an image model trained utilizing the extended dataset; generate a subsequent caption for one or more identified objects contained within the subsequent image utilizing a language generating model trained utilizing the extended dataset.
-
公开(公告)号:US11651522B2
公开(公告)日:2023-05-16
申请号:US16923142
申请日:2020-07-08
发明人: Shiwan Zhao , Yi Ke Wu , Hao Kai Zhang , Zhong Su
摘要: In an approach to improving the image captioning performance of low-resource languages by leveraging multimodal inputs, one or more computer processors encode an image utilizing an image encoder, wherein the image is contained within a triplet comprising the image, one or more high-resource captions, and one or more low-resource captions. The one or more computer processors generate one or more high-resource captions utilizing the encoded image and the triplet inputted into a high-resource decoder. The one or more computer processors encode the one or more generated high-resource captions utilizing a high-resource encoder. The one or more computer processors add adaptive cycle consistency constraints on a set of calculated attention weights associated the triplet. The one or more computer processors generate one or more low-resource captions by simultaneously inputting the encoded image, the encoded high-resource caption, and the triplet into a trained low-resource decoder.
-
公开(公告)号:US20220012534A1
公开(公告)日:2022-01-13
申请号:US16922155
申请日:2020-07-07
发明人: Shiwan Zhao , Hao Kai Zhang , Yi Ke Wu , Zhong Su
摘要: In an approach to augmenting a caption dataset by leveraging a denoising autoencoder to sample and generate additional captions from the ground truth captions, one or more computer processors generate a plurality of new captions utilizing an autoencoder fed with one or more noisy captions, wherein the autoencoder is trained with a dataset comprising a plurality of ground truth captions. The one or more computer processors calculate an importance weight for each new caption in the plurality of generated new captions as compared to a plurality of associated ground truth captions based on a consensus metric. The one or more computer processors train a caption model with the generated plurality of new captions and associated calculated weights.
-
-
-
-
-