发明授权
- 专利标题: Unstructured and semistructured document processing and searching
- 专利标题(中): 非结构化和半结构化文档处理和搜索
-
申请号: US11737660申请日: 2007-04-19
-
公开(公告)号: US08504553B2公开(公告)日: 2013-08-06
- 发明人: Aditya Vailaya , Jiang Wu , Manish Rathi
- 申请人: Aditya Vailaya , Jiang Wu , Manish Rathi
- 申请人地址: US NY New York
- 专利权人: barnesandnoble.com llc
- 当前专利权人: barnesandnoble.com llc
- 当前专利权人地址: US NY New York
- 代理机构: Finch & Maloney PLLC
- 主分类号: G06F7/00
- IPC分类号: G06F7/00
摘要:
A method for analyzing and indexing an unstructured or semistructured document according to one embodiment includes receiving an unstructured or semistructured document; converting the document to one or more text streams; analyzing the one or more text streams for identifying textual contents of the document; analyzing the one or more text streams for identifying logical sections of the document; associating the textual contents with the logical sections; indexing the textual contents and their association with the logical sections; and saving a result of the indexing in a data storage device.