System for determining layouts of webpages
Abstract:
Described are techniques for automatically grouping webpages based on common layout characteristics without requiring presentation of the webpages. The relationships between HTML elements or other types of nodes in a webpage may be used to generate aliases for each node. Each alias represents a particular relationship or combination of relationships between the node and one or more other nodes. This process may be repeated for each webpage in a website, then the matching aliases between webpages may be determined. Based on the matching aliases between webpages, the webpages that share a large number of common aliases may be grouped into a cluster of webpages having common layout characteristics.
Information query
Patent Agency Ranking
0/0