SYSTEM AND METHOD FOR DOCUMENT METADATA ANALYSIS AND GENERATION

    公开(公告)号:US20240346061A1

    公开(公告)日:2024-10-17

    申请号:US18635152

    申请日:2024-04-15

    申请人: En HONG

    发明人: En HONG

    IPC分类号: G06F16/383 G06F16/33

    CPC分类号: G06F16/383 G06F16/3331

    摘要: Provided is a system and a computer implemented method for generating metadata for a document and documents therefrom. A document is received for analysis via a communication interface a document for analysis. Text is extracted from the document and using a structural analyser model a plurality of segment titles therein are identified and regular expressions derived therefrom. A segmented document is generated comprising extracted text in logical segments with corresponding segment titles by analysing the generated regular expressions. For at least some of the plurality of the logical segments of the segmented document structured metadata summaries of that segment are generated using a metadata creator model. Document metadata thereof is also generated using a metadata creator model. When a new document is requested, using a library of documents and corresponding metadata that document may be generated.