摘要:
A method and system for web mining and clustering is described. The method includes receiving and dividing input data into a plurality of primitive datasets. Additionally, one or more combinations of the plurality of primitive datasets may be created. Further, a model for each primitive dataset in the plurality of primitive datasets and each of the one or more combinations of the plurality of primitive datasets may be generated. Subsequently, a cost associated with a model corresponding to each primitive dataset in the plurality of primitive datasets, and each of the one or more combinations of the plurality of primitive datasets may be computed. Further, a sum of the costs associated with the models corresponding to each primitive dataset in the plurality of primitive datasets may be compared with the cost associated with each model corresponding to each of the one or more combinations of the plurality of primitive datasets. Finally, the plurality of primitive datasets may be partitioned into one or more clusters based on the comparison of the costs such that each primitive dataset is a part of a cluster in the one or more clusters or a stand-alone primitive dataset.
摘要:
A method and system for web mining and clustering is described. The method includes receiving and dividing input data into a plurality of primitive datasets. Additionally, one or more combinations of the plurality of primitive datasets may be created. Further, a model for each primitive dataset in the plurality of primitive datasets and each of the one or more combinations of the plurality of primitive datasets may be generated. Subsequently, a cost associated with a model corresponding to each primitive dataset in the plurality of primitive datasets, and each of the one or more combinations of the plurality of primitive datasets may be computed. Further, a sum of the costs associated with the models corresponding to each primitive dataset in the plurality of primitive datasets may be compared with the cost associated with each model corresponding to each of the one or more combinations of the plurality of primitive datasets. Finally, the plurality of primitive datasets may be partitioned into one or more clusters based on the comparison of the costs such that each primitive dataset is a part of a cluster in the one or more clusters or a stand-alone primitive dataset.
摘要:
Systems and methods (e.g., utilities) for use in providing automated, lightweight collection of online, open source data which may be content-based to reduce website source bias. In one aspect, a utility is disclosed for use in extracting content of interest from at least one website or other online data source (e.g., where the extracted content can be used in a subsequent search query). In other aspects, utilities are disclosed that are operable to perform various types of analyses on such extracted content and present graphical representations of such analyses on a display of a client device.
摘要:
Mining of websites that in one embodiment includes obtaining web usage data of user sessions of a website, wherein the website has a hierarchical structure with granular levels and has mapping from each webpage of the website into the hierarchical structure, mapping the user sessions to the hierarchical structure of the website resulting in hierarchical user sessions, initiating an edit distance metrics to determine similarity in the hierarchical user sessions, and clustering similar hierarchical user sessions into groups.
摘要:
Mining of websites that in one embodiment includes obtaining web usage data of user sessions of a website, wherein the website has a hierarchical structure with granular levels and has mapping from each webpage of the website into the hierarchical structure, mapping the user sessions to the hierarchical structure of the website resulting in hierarchical user sessions, initiating an edit distance metrics to determine similarity in the hierarchical user sessions, and clustering similar hierarchical user sessions into groups.
摘要:
Systems and methods (e.g., utilities) for use in providing automated, lightweight collection of online, open source data which may be content-based to reduce website source bias. In one aspect, a utility is disclosed for use in extracting content of interest from at least one website or other online data source (e.g., where the extracted content can be used in a subsequent search query). In other aspects, utilities are disclosed that are operable to perform various types of analyses on such extracted content and present graphical representations of such analyses on a display of a client device.
摘要:
Systems and methods (e.g., utilities) for use in providing automated, lightweight collection of online, open source data which may be content-based to reduce website source bias. In one aspect, a utility is disclosed for use in extracting content of interest from at least one website or other online data source (e.g., where the extracted content can be used in a subsequent search query). In other aspects, utilities are disclosed that are operable to perform various types of analyses on such extracted content and present graphical representations of such analyses on a display of a client device.
摘要:
One website mining embodiment is for characterizing first time users of a website, collecting user session data of the users visiting the website and identifying first time visitors, determining features of the first time visitors utilizing the user session data, determining rules utilizing the features of the first time visitors, monitoring actions of the first time visitors on the website, updating the rules utilizing the monitored actions of the first time visitors and recommending web content utilizing the rules to the first time visitor.
摘要:
One website mining embodiment is for characterizing first time users of a website, collecting user session data of the users visiting the website and identifying first time visitors, determining features of the first time visitors utilizing the user session data, determining rules utilizing the features of the first time visitors, monitoring actions of the first time visitors on the website, updating the rules utilizing the monitored actions of the first time visitors and recommending web content utilizing the rules to the first time visitor.
摘要:
Systems and methods (e.g., utilities) for use in providing automated, lightweight collection of online, open source data which may be content-based to reduce website source bias. In one aspect, a utility is disclosed for use in extracting content of interest from at least one website or other online data source (e.g., where the extracted content can be used in a subsequent search query). In other aspects, utilities are disclosed that are operable to perform various types of analyses on such extracted content and present graphical representations of such analyses on a display of a client device.