- 专利标题: Extraction of information from structured documents
-
申请号: US10463521申请日: 2003-06-18
-
公开(公告)号: US07685157B2公开(公告)日: 2010-03-23
- 发明人: Tadasu Uchiyama , Masaru Miyamoto
- 申请人: Tadasu Uchiyama , Masaru Miyamoto
- 申请人地址: JP Tokyo
- 专利权人: Nippon Telegraph and Telephone Corporation
- 当前专利权人: Nippon Telegraph and Telephone Corporation
- 当前专利权人地址: JP Tokyo
- 代理机构: Oblon, Spivak, McClelland, Maier & Neustadt, L.L.P.
- 优先权: JP2002-190621 20020628; JP2002-204641 20020712
- 主分类号: G06F7/00
- IPC分类号: G06F7/00 ; G06F17/00 ; G06F17/20
摘要:
A method of extracting information from a structured document includes the steps of assigning a partial tree identifier inclusive of a tag identifier to a selected partial tree wherein the tag identifier includes a name of a tag corresponding to a root of the selected partial tree, a name of at least one format attribute of the tag, and a value of the at least one format attribute, arranging names of format attributes in a predetermined order in the tag identifier if the at least one format attribute of the tag includes two or more format attributes, and identifying a partial tree having a partial tree identifier identical to the partial tree identifier of the selected partial tree from a list of partial tree identifiers of partial trees that exist in the structured document after updating thereof.
公开/授权文献
- US20040044963A1 Extraction of information from structured documents 公开/授权日:2004-03-04
信息查询