-
1.
公开(公告)号:US20240126791A1
公开(公告)日:2024-04-18
申请号:US18470657
申请日:2023-09-20
发明人: ANUMITA DASGUPTABANDYOPADHYAY , PRABIR MALLICK , TAPAS NAYAK , INDRAJIT BHATTACHARYA , SANGAMESHWAR SURYAKANT PATIL
IPC分类号: G06F16/31 , G06F16/332
CPC分类号: G06F16/31 , G06F16/3329
摘要: This disclosure relates generally to long-form answer extraction and, more particularly, to long-form answer extraction based on combination of sentence index generation techniques. Existing answer extractions techniques have achieved significant progress for extractive short answers; however, less progress has been made for long form questions that require explanations. Further the state-of-art long-answer extractions techniques result in poorer long-form answers or not address sparsity which becomes an issue longer contexts. Additionally, pre-trained generative sequence-to-sequence models are gaining popularity for factoid answer extraction tasks. Hence the disclosure proposes a long-form answer extraction based on several steps including training a set of generative sequence-to-sequence models comprising a sentence indices generation model and a sentence index spans generation. The trained set of generative sequence-to-sequence models is further utilized for model long-form answer extraction based on a union of several sentence index generation techniques comprising a sentence indices and a sentence index spans.