- 专利标题: Generating a structure of a PDF-document
-
申请号: US17649597申请日: 2022-02-01
-
公开(公告)号: US11687700B1公开(公告)日: 2023-06-27
- 发明人: Birgit Monika Pfitzmann , Christoph Auer , Michele Dolfi , Peter Willem Jan Staar , Ahmed Samy Nassar
- 申请人: INTERNATIONAL BUSINESS MACHINES CORPORATION
- 申请人地址: US NY Armonk
- 专利权人: INTERNATIONAL BUSINESS MACHINES CORPORATION
- 当前专利权人: INTERNATIONAL BUSINESS MACHINES CORPORATION
- 当前专利权人地址: US NY Armonk
- 代理商 Robert R. Aragona
- 主分类号: G06F40/00
- IPC分类号: G06F40/00 ; G06F40/103 ; G06N3/08 ; G06V30/412 ; G06V30/414
摘要:
The present disclosure relates to a method for generating a structure of a PDF-document, wherein the PDF-document comprises elements. The method comprises detecting document cells of the PDF-document dependent on commands of a page description language for printing the elements of the PDF-document. The method comprises determining parts of the PDF-document dependent on the PDF-document by a machine learning module. The determining of the respective part comprises associating a respective portion of the elements of the PDF-document with the respective part. Furthermore, a respective label may be assigned to the respective part. The method may further comprise using a symbolic artificial intelligence module, wherein rules of the symbolic AI-module for reconciling the document cells with the parts may be applied. The elements of the structure of the PDF-document may be generated and labelled dependent on a result of the reconciling and dependent on the respective label to the respective part.
信息查询