发明授权
US08886660B2 Method and apparatus for tracking a change in a collection of web documents 有权
跟踪Web文档集合中的变化的方法和装置

Method and apparatus for tracking a change in a collection of web documents
摘要:
A method and an apparatus for tracking changes in a collection of web documents, for example, provided by a web site. The web documents are retrieved at a first assigned point in time and a second assigned point in time. Then a similarity measure for a combination of a retrieved web document at a first assigned point in time and a retrieved web document at a second assigned point in time is calculated for determining pairs of corresponding web documents. By comparing said calculated similarity measure of a pair of corresponding web documents with predetermined thresholds for the similarity measure a change in the content of the corresponding web document between the first assigned point in time and second assigned point in time is detected. Instead of referring to identifiers like URLs for web pages the content similarities of web pages are considered. The proposed strategy facilitates the work of marketing analysts.
信息查询
0/0