专利检索 ap:("International Business Machines Corporation") AND inv:"Hao Kai Zhang" 第 1 页

1.

发明授权
Denoising autoencoder image captioning 有权

公开(公告)号：US11763544B2

公开(公告)日：2023-09-19

申请号：US16922155

申请日：2020-07-07

申请人： International Business Machines Corporation

发明人： Shiwan Zhao , Hao Kai Zhang , Yi Ke Wu , Zhong Su

IPC分类号： G06V10/30 , G06N20/00 , G06F18/214 , G06F18/22 , G06F18/21 , G06F18/2113 , G06V10/74 , G06V10/764 , G06V10/771 , G06V10/774 , G06V10/776

CPC分类号： G06V10/30 , G06F18/217 , G06F18/2113 , G06F18/2148 , G06F18/22 , G06N20/00 , G06V10/761 , G06V10/764 , G06V10/771 , G06V10/776 , G06V10/7747

摘要： In an approach to augmenting a caption dataset by leveraging a denoising autoencoder to sample and generate additional captions from the ground truth captions, one or more computer processors generate a plurality of new captions utilizing an autoencoder fed with one or more noisy captions, wherein the autoencoder is trained with a dataset comprising a plurality of ground truth captions. The one or more computer processors calculate an importance weight for each new caption in the plurality of generated new captions as compared to a plurality of associated ground truth captions based on a consensus metric. The one or more computer processors train a caption model with the generated plurality of new captions and associated calculated weights.

2.

发明申请
ADAPTIVE CYCLE CONSISTENCY MULTIMODAL IMAGE CAPTIONING 有权

公开(公告)号：US20220012919A1

公开(公告)日：2022-01-13

申请号：US16923142

申请日：2020-07-08

申请人： International Business Machines Corporation

发明人： Shiwan Zhao , Yi Ke Wu , Hao Kai Zhang , Zhong Su

IPC分类号： G06T9/00 , G06N3/08 , G06N3/04

摘要： In an approach to improving the image captioning performance of low-resource languages by leveraging multimodal inputs, one or more computer processors encode an image utilizing an image encoder, wherein the image is contained within a triplet comprising the image, one or more high-resource captions, and one or more low-resource captions. The one or more computer processors generate one or more high-resource captions utilizing the encoded image and the triplet inputted into a high-resource decoder. The one or more computer processors encode the one or more generated high-resource captions utilizing a high-resource encoder. The one or more computer processors add adaptive cycle consistency constraints on a set of calculated attention weights associated the triplet. The one or more computer processors generate one or more low-resource captions by simultaneously inputting the encoded image, the encoded high-resource caption, and the triplet into a trained low-resource decoder.

3.

发明授权
Mixup image captioning 有权

公开(公告)号：US11334769B2

公开(公告)日：2022-05-17

申请号：US16922367

申请日：2020-07-07

申请人： International Business Machines Corporation

发明人： Shiwan Zhao , Yi Ke Wu , Hao Kai Zhang , Zhong Su

IPC分类号： G06K9/62 , G06F16/55 , G06N3/08

摘要： In an approach to augmenting caption datasets, one or more computer processors sample a ratio lambda from a probability distribution based on a pair of datapoints contained in a dataset, wherein each datapoint in the pair of datapoints comprises an image and an associated caption; extend the dataset by generating one or more new datapoints based on the sampled ratio lambda for each pair of datapoints in the dataset, wherein the sampled ratio lambda incorporates an interpolation of features associated with the pair of datapoints into the generated one or more new datapoints; identify one or more objects contained within a subsequent image utilizing an image model trained utilizing the extended dataset; generate a subsequent caption for one or more identified objects contained within the subsequent image utilizing a language generating model trained utilizing the extended dataset.

4.

发明申请
MIXUP IMAGE CAPTIONING 有权

公开(公告)号：US20220012544A1

公开(公告)日：2022-01-13

申请号：US16922367

申请日：2020-07-07

申请人： International Business Machines Corporation

发明人： Shiwan Zhao , Yi Ke Wu , Hao Kai Zhang , Zhong Su

IPC分类号： G06K9/62 , G06F16/55 , G06N3/08

摘要： In an approach to augmenting caption datasets, one or more computer processors sample a ratio lambda from a probability distribution based on a pair of datapoints contained in a dataset, wherein each datapoint in the pair of datapoints comprises an image and an associated caption; extend the dataset by generating one or more new datapoints based on the sampled ratio lambda for each pair of datapoints in the dataset, wherein the sampled ratio lambda incorporates an interpolation of features associated with the pair of datapoints into the generated one or more new datapoints; identify one or more objects contained within a subsequent image utilizing an image model trained utilizing the extended dataset; generate a subsequent caption for one or more identified objects contained within the subsequent image utilizing a language generating model trained utilizing the extended dataset.

5.

发明授权
Adaptive cycle consistency multimodal image captioning 有权

公开(公告)号：US11651522B2

公开(公告)日：2023-05-16

申请号：US16923142

申请日：2020-07-08

申请人： International Business Machines Corporation

发明人： Shiwan Zhao , Yi Ke Wu , Hao Kai Zhang , Zhong Su

IPC分类号： G06T9/00 , G06N3/08 , G06N3/045

CPC分类号： G06T9/002 , G06N3/045 , G06N3/08

摘要： In an approach to improving the image captioning performance of low-resource languages by leveraging multimodal inputs, one or more computer processors encode an image utilizing an image encoder, wherein the image is contained within a triplet comprising the image, one or more high-resource captions, and one or more low-resource captions. The one or more computer processors generate one or more high-resource captions utilizing the encoded image and the triplet inputted into a high-resource decoder. The one or more computer processors encode the one or more generated high-resource captions utilizing a high-resource encoder. The one or more computer processors add adaptive cycle consistency constraints on a set of calculated attention weights associated the triplet. The one or more computer processors generate one or more low-resource captions by simultaneously inputting the encoded image, the encoded high-resource caption, and the triplet into a trained low-resource decoder.

6.

发明申请
DENOISING AUTOENCODER IMAGE CAPTIONING 有权

公开(公告)号：US20220012534A1

公开(公告)日：2022-01-13

申请号：US16922155

申请日：2020-07-07

申请人： International Business Machines Corporation

发明人： Shiwan Zhao , Hao Kai Zhang , Yi Ke Wu , Zhong Su

IPC分类号： G06K9/62 , G06N20/00 , G06K9/40

摘要： In an approach to augmenting a caption dataset by leveraging a denoising autoencoder to sample and generate additional captions from the ground truth captions, one or more computer processors generate a plurality of new captions utilizing an autoencoder fed with one or more noisy captions, wherein the autoencoder is trained with a dataset comprising a plurality of ground truth captions. The one or more computer processors calculate an importance weight for each new caption in the plurality of generated new captions as compared to a plurality of associated ground truth captions based on a consensus metric. The one or more computer processors train a caption model with the generated plurality of new captions and associated calculated weights.

搜索结果

国家/区域

专利有效性

申请日

公布(公告)日

申请人

申请人所在国/区域

发明人

IPC

IPC部

IPC大类

IPC小类

IPC大组

IPC小组

外观分类