-
1.
公开(公告)号:US20240126791A1
公开(公告)日:2024-04-18
申请号:US18470657
申请日:2023-09-20
Applicant: Tata Consultancy Services Limited
Inventor: ANUMITA DASGUPTABANDYOPADHYAY , PRABIR MALLICK , TAPAS NAYAK , INDRAJIT BHATTACHARYA , SANGAMESHWAR SURYAKANT PATIL
IPC: G06F16/31 , G06F16/332
CPC classification number: G06F16/31 , G06F16/3329
Abstract: This disclosure relates generally to long-form answer extraction and, more particularly, to long-form answer extraction based on combination of sentence index generation techniques. Existing answer extractions techniques have achieved significant progress for extractive short answers; however, less progress has been made for long form questions that require explanations. Further the state-of-art long-answer extractions techniques result in poorer long-form answers or not address sparsity which becomes an issue longer contexts. Additionally, pre-trained generative sequence-to-sequence models are gaining popularity for factoid answer extraction tasks. Hence the disclosure proposes a long-form answer extraction based on several steps including training a set of generative sequence-to-sequence models comprising a sentence indices generation model and a sentence index spans generation. The trained set of generative sequence-to-sequence models is further utilized for model long-form answer extraction based on a union of several sentence index generation techniques comprising a sentence indices and a sentence index spans.
-
公开(公告)号:US20240119075A1
公开(公告)日:2024-04-11
申请号:US18479646
申请日:2023-10-02
Applicant: Tata Consultancy Services Limited
Inventor: PRABIR MALLICK , SAMIRAN PAL , AVINASH KUMAR SINGH , ANUMITA DASGUPTA , SOHAM DATTA , KAAMRAAN KHAN , TAPAS NAYAK , INDRAJIT BHATTACHARYA , GIRISH KESHAV PALSHIKAR
IPC: G06F16/332 , G06F16/33 , G06F40/186 , G06F40/284 , G06F40/289 , G06F40/30 , G06F40/40
CPC classification number: G06F16/3329 , G06F16/3344 , G06F40/186 , G06F40/284 , G06F40/289 , G06F40/30 , G06F40/40
Abstract: Conventional Question and Answer (QA) datasets are created for generating factoid questions only and the present disclosure generates longform technical QA dataset from textbooks. Initially, the system receives a technical textbook document and extracts a plurality of contexts. Further, a first plurality of questions are generated based on the plurality of contexts. A plurality of answerable questions are generated further based on the plurality of contexts using an unsupervised template-based matching technique. Further, a combined plurality of questions are generated by combining the first plurality of questions and the plurality of answerable questions. Further, an answer for the combined plurality of questions are generated using an autoregressive language model and a mapping score is computed. Further, a plurality of optimal answers are selected based on the corresponding mapping score. Finally, a longform technical question and answer dataset is generated based on the combined plurality of questions and optimal answers.
-
3.
公开(公告)号:US20240095466A1
公开(公告)日:2024-03-21
申请号:US18450588
申请日:2023-08-16
Applicant: Tata Consultancy Services Limited
Inventor: SUBHASISH GHOSH , ARPITA KUNDU , INDRAJIT BHATTACHARYA , PRATIK SAINI , TAPAS NAYAK
IPC: G06F40/40 , G06F40/137 , G06F40/205 , G06Q50/20 , G06V30/413
CPC classification number: G06F40/40 , G06F40/137 , G06F40/205 , G06Q50/20 , G06V30/413 , G06V2201/10
Abstract: The present disclosure a method for document structure based unsupervised long-form technical question generation. Initially, the system receives a textbook document. Further, a PDF metadata is extracted from the textbook document using a Natural Language Processing (NLP) technique. Further, a plurality of structures from the textbook document based on the PDF metadata using an NLP based filtering technique. Further, a plurality of index based question templates and Table of Contents (TOC) based question templates are obtained from a plurality of predefined question templates using the plurality of structures. Further, the generated plurality of long-form technical questions are generated using the obtained index and TOC based question templates. The plurality of long-form technical questions are further evaluated by the system using plurality of metrics. Further, the generated plurality of long-form technical questions are used to finetune a supervised question generation model for generating optimal questions from document structure.
-
-