Extraction of information from structured documents
摘要:
A method of extracting information from a structured document includes the steps of assigning a partial tree identifier inclusive of a tag identifier to a selected partial tree wherein the tag identifier includes a name of a tag corresponding to a root of the selected partial tree, a name of at least one format attribute of the tag, and a value of the at least one format attribute, arranging names of format attributes in a predetermined order in the tag identifier if the at least one format attribute of the tag includes two or more format attributes, and identifying a partial tree having a partial tree identifier identical to the partial tree identifier of the selected partial tree from a list of partial tree identifiers of partial trees that exist in the structured document after updating thereof.
公开/授权文献
信息查询
0/0