发明授权
- 专利标题: Automatic generation of training data for scientific paper summarization using videos
-
申请号: US16799865申请日: 2020-02-25
-
公开(公告)号: US11270061B2公开(公告)日: 2022-03-08
- 发明人: Jonathan Herzig , Achiya Jerbi , David Konopnicki , Guy Lev , Michal Shmueli-Scheuer
- 申请人: International Business Machines Corporation
- 申请人地址: US NY Armonk
- 专利权人: International Business Machines Corporation
- 当前专利权人: International Business Machines Corporation
- 当前专利权人地址: US NY Armonk
- 代理商 Gregory J Kirsch
- 主分类号: G06F17/00
- IPC分类号: G06F17/00 ; G06F40/12 ; G06F40/274 ; G06N20/10 ; G06N7/00 ; G06F40/30
摘要:
Embodiments may provide techniques to generate training data for summarization of complex documents, such as scientific papers, articles, etc., that are scalable to provide large scale training data. For example, in an embodiment, a method may be implemented in a computer system and may comprise collecting a plurality of video and audio recordings of presentations of documents, collecting a plurality of documents corresponding to the video and audio recordings, converting the plurality of video and audio recordings of presentations of documents into transcripts of the plurality of presentations, generating a summary of each document by selecting a plurality of sentences from each document using the transcript of the that document, generating a dataset comprising a plurality of the generated summaries, and training a machine learning model using the generated dataset.
公开/授权文献
信息查询