Automatic generation of training data for scientific paper summarization using videos

发明授权

US11270061B2 Automatic generation of training data for scientific paper summarization using videos 有权

请登陆查看更多内容

专利标题： Automatic generation of training data for scientific paper summarization using videos
申请号： US16799865

申请日： 2020-02-25
公开(公告)号： US11270061B2

公开(公告)日： 2022-03-08
发明人: Jonathan Herzig , Achiya Jerbi , David Konopnicki , Guy Lev , Michal Shmueli-Scheuer
申请人： International Business Machines Corporation
申请人地址： US NY Armonk
专利权人： International Business Machines Corporation
当前专利权人： International Business Machines Corporation
当前专利权人地址： US NY Armonk
代理商 Gregory J Kirsch
主分类号： G06F17/00
IPC分类号： G06F17/00 ; G06F40/12 ; G06F40/274 ; G06N20/10 ; G06N7/00 ; G06F40/30

Automatic generation of training data for scientific paper summarization using videos

摘要：

Embodiments may provide techniques to generate training data for summarization of complex documents, such as scientific papers, articles, etc., that are scalable to provide large scale training data. For example, in an embodiment, a method may be implemented in a computer system and may comprise collecting a plurality of video and audio recordings of presentations of documents, collecting a plurality of documents corresponding to the video and audio recordings, converting the plurality of video and audio recordings of presentations of documents into transcripts of the plurality of presentations, generating a summary of each document by selecting a plurality of sentences from each document using the transcript of the that document, generating a dataset comprising a plurality of the generated summaries, and training a machine learning model using the generated dataset.

公开/授权文献

US20210264097A1 AUTOMATIC GENERATION OF TRAINING DATA FOR SCIENTIFIC PAPER SUMMARIZATION USING VIDEOS 公开/授权日：2021-08-26

信息查询

Espacenet

IPC分类:

G	物理
G06	计算；推算或计数
G06F	电数字数据处理（基于特定计算模型的计算机系统入G06N）
G06F17/00	特别适用于特定功能的数字计算设备或数据处理设备或数据处理方法（信息检索，数据库结构或文件系统结构，G06F 16/00）