发明授权
- 专利标题: Segmenting printed media pages into articles
- 专利标题(中): 将印刷媒体页面分割成文章
-
申请号: US13612072申请日: 2012-09-12
-
公开(公告)号: US08693779B1公开(公告)日: 2014-04-08
- 发明人: Ankur Jain , Vivek Sahasranaman , Shobhit Saxena , Krishnendu Chaudhury
- 申请人: Ankur Jain , Vivek Sahasranaman , Shobhit Saxena , Krishnendu Chaudhury
- 申请人地址: US CA Mountain View
- 专利权人: Google Inc.
- 当前专利权人: Google Inc.
- 当前专利权人地址: US CA Mountain View
- 代理机构: Sterne, Kessler, Goldstein & Fox P.L.L.C.
- 主分类号: G06K9/34
- IPC分类号: G06K9/34 ; G06K9/46 ; G06K9/66
摘要:
Methods and systems for segmenting printed media pages into individual articles quickly and efficiently. A printed media based image that may include a variety of columns, headlines, images, and text is input into the system which comprises a block segmenter and an article segmenter system. The block segmenter identifies and produces blocks of textual content from a printed media image while the article segmenter system determines which blocks of textual content belong to one or more articles in the printed media image based on a classifier algorithm. A method for segmenting printed media pages into individual articles is also presented.
信息查询