发明授权
US07984076B2 Document processing apparatus, document processing method, document processing program and recording medium 有权
文件处理装置,文件处理方法,文件处理程序和记录介质

Document processing apparatus, document processing method, document processing program and recording medium
摘要:
The text format of input data is checked, and is converted into a system-manipulated format. It is further determined if the input data is in an HTML or e-mail format using tags, heading information, and the like. The converted data is divided into blocks in a simple manner such that elements in the blocks can be checked based on repetition of predetermined character patterns. Each block section is tagged with a tag indicating a block. The data divided into blocks is parsed based on tags, character patterns, etc., and is structured. A table in text is also parsed, and is segmented into cells. Finally, tree-structured data having a hierarchical structure is generated based on the sentence-structured data. A sentence-extraction template paired with the tree-structured data is used to extract sentences.
信息查询
0/0