发明授权
US08645819B2 Detection and extraction of elements constituting images in unstructured document files 有权
在非结构化文档文件中检测和提取构成图像的元素

  • 专利标题: Detection and extraction of elements constituting images in unstructured document files
  • 专利标题(中): 在非结构化文档文件中检测和提取构成图像的元素
  • 申请号: US13162858
    申请日: 2011-06-17
  • 公开(公告)号: US08645819B2
    公开(公告)日: 2014-02-04
  • 发明人: Hervé Déjean
  • 申请人: Hervé Déjean
  • 申请人地址: US CT Norwalk
  • 专利权人: Xerox Corporation
  • 当前专利权人: Xerox Corporation
  • 当前专利权人地址: US CT Norwalk
  • 代理机构: Fay Sharpe LLP
  • 主分类号: G06F17/00
  • IPC分类号: G06F17/00
Detection and extraction of elements constituting images in unstructured document files
摘要:
A method and a system for detecting and extracting images in an electronic document are disclosed. The method includes receiving an electronic document and identifying elements of a page. The identified elements include a set of graphical elements and a set of text elements. The method may include identifying and excluding elements which serve as graphical page constructs and/or text formatting elements. The page can then be segmented, based on (remaining) graphical elements and identified white spaces, to generate a set of image blocks. Text elements that are associated with a respective image block are identified as captions. Overlapping candidate images are then grouped to form a new image. The new image can thus include candidate images which would, without the identification of their caption(s), each be treated as a respective image.
信息查询
0/0