发明申请
- 专利标题: SELECTIVE CONTENT EXTRACTION
- 专利标题(中): 选择性内容提取
-
申请号: US13378153申请日: 2009-06-30
-
公开(公告)号: US20120089903A1公开(公告)日: 2012-04-12
- 发明人: Sam Liu , Parag Joshi , Yuhong Xiong , Clayton Atkins , Jerry Liu
- 申请人: Sam Liu , Parag Joshi , Yuhong Xiong , Clayton Atkins , Jerry Liu
- 申请人地址: US TX Houston
- 专利权人: HEWLETT-PACKARD DEVELOPMENT COMPANY, L.P.
- 当前专利权人: HEWLETT-PACKARD DEVELOPMENT COMPANY, L.P.
- 当前专利权人地址: US TX Houston
- 国际申请: PCT/US2009/049298 WO 20090630
- 主分类号: G06F17/00
- IPC分类号: G06F17/00
摘要:
A method for extracting web content includes detecting, within a web page, a hierarchical structure that includes a plurality of nodes. Potential article nodes from the plurality of nodes are identified. The identified potential article node with a highest rank in the hierarchical structure is identified as an article node. Content is extracted from the article node.
公开/授权文献
- US09032285B2 Selective content extraction 公开/授权日:2015-05-12