-
公开(公告)号:US20240346061A1
公开(公告)日:2024-10-17
申请号:US18635152
申请日:2024-04-15
申请人: En HONG
发明人: En HONG
IPC分类号: G06F16/383 , G06F16/33
CPC分类号: G06F16/383 , G06F16/3331
摘要: Provided is a system and a computer implemented method for generating metadata for a document and documents therefrom. A document is received for analysis via a communication interface a document for analysis. Text is extracted from the document and using a structural analyser model a plurality of segment titles therein are identified and regular expressions derived therefrom. A segmented document is generated comprising extracted text in logical segments with corresponding segment titles by analysing the generated regular expressions. For at least some of the plurality of the logical segments of the segmented document structured metadata summaries of that segment are generated using a metadata creator model. Document metadata thereof is also generated using a metadata creator model. When a new document is requested, using a library of documents and corresponding metadata that document may be generated.