SYSTEM AND METHOD FOR WEB MINING AND CLUSTERING
    1.
    发明申请
    SYSTEM AND METHOD FOR WEB MINING AND CLUSTERING 有权
    用于网络挖掘和集群的系统和方法

    公开(公告)号:US20110295892A1

    公开(公告)日:2011-12-01

    申请号:US12787114

    申请日:2010-05-25

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30864 G06K9/6224

    摘要: A method and system for web mining and clustering is described. The method includes receiving and dividing input data into a plurality of primitive datasets. Additionally, one or more combinations of the plurality of primitive datasets may be created. Further, a model for each primitive dataset in the plurality of primitive datasets and each of the one or more combinations of the plurality of primitive datasets may be generated. Subsequently, a cost associated with a model corresponding to each primitive dataset in the plurality of primitive datasets, and each of the one or more combinations of the plurality of primitive datasets may be computed. Further, a sum of the costs associated with the models corresponding to each primitive dataset in the plurality of primitive datasets may be compared with the cost associated with each model corresponding to each of the one or more combinations of the plurality of primitive datasets. Finally, the plurality of primitive datasets may be partitioned into one or more clusters based on the comparison of the costs such that each primitive dataset is a part of a cluster in the one or more clusters or a stand-alone primitive dataset.

    摘要翻译: 描述了一种用于Web挖掘和聚类的方法和系统。 该方法包括将输入数据接收并分割成多个原始数据集。 另外,可以创建多个基本数据集的一个或多个组合。 此外,可以生成多个原始数据集中的每个基本数据集的模型以及多个基本数据集中的一个或多个组合中的每一个的模型。 随后,可以计算与多个基本数据集中的每个基本数据集对应的模型相关联的成本,以及多个基本数据集中的一个或多个组合中的每一个的成本。 此外,可以将与多个原始数据集中的每个原始数据集相对应的模型的成本的总和与与多个原始数据集中的一个或多个组合中的每一个对应的每个模型相关联的成本进行比较。 最后,可以基于成本的比较将多个原始数据集划分成一个或多个集群,使得每个基本数据集是一个或多个集群中的集群的一部分或独立原始数据集。

    System and method for web mining and clustering
    2.
    发明授权
    System and method for web mining and clustering 有权
    Web挖掘和集群的系统和方法

    公开(公告)号:US08521773B2

    公开(公告)日:2013-08-27

    申请号:US12787114

    申请日:2010-05-25

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30864 G06K9/6224

    摘要: A method and system for web mining and clustering is described. The method includes receiving and dividing input data into a plurality of primitive datasets. Additionally, one or more combinations of the plurality of primitive datasets may be created. Further, a model for each primitive dataset in the plurality of primitive datasets and each of the one or more combinations of the plurality of primitive datasets may be generated. Subsequently, a cost associated with a model corresponding to each primitive dataset in the plurality of primitive datasets, and each of the one or more combinations of the plurality of primitive datasets may be computed. Further, a sum of the costs associated with the models corresponding to each primitive dataset in the plurality of primitive datasets may be compared with the cost associated with each model corresponding to each of the one or more combinations of the plurality of primitive datasets. Finally, the plurality of primitive datasets may be partitioned into one or more clusters based on the comparison of the costs such that each primitive dataset is a part of a cluster in the one or more clusters or a stand-alone primitive dataset.

    摘要翻译: 描述了一种用于Web挖掘和聚类的方法和系统。 该方法包括将输入数据接收并分割成多个原始数据集。 另外,可以创建多个基本数据集的一个或多个组合。 此外,可以生成多个原始数据集中的每个基本数据集的模型以及多个基本数据集中的一个或多个组合中的每一个的模型。 随后,可以计算与多个基本数据集中的每个基本数据集对应的模型相关联的成本,以及多个基本数据集中的一个或多个组合中的每一个的成本。 此外,可以将与多个原始数据集中的每个原始数据集相对应的模型的成本的总和与与多个原始数据集中的一个或多个组合中的每一个对应的每个模型相关联的成本进行比较。 最后,可以基于成本的比较将多个原始数据集划分成一个或多个集群,使得每个基本数据集是一个或多个集群中的集群的一部分或独立原始数据集。

    Methods and systems for mining websites
    4.
    发明授权
    Methods and systems for mining websites 有权
    挖掘网站的方法和系统

    公开(公告)号:US08219583B2

    公开(公告)日:2012-07-10

    申请号:US12267778

    申请日:2008-11-10

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30861 G06Q30/02

    摘要: Mining of websites that in one embodiment includes obtaining web usage data of user sessions of a website, wherein the website has a hierarchical structure with granular levels and has mapping from each webpage of the website into the hierarchical structure, mapping the user sessions to the hierarchical structure of the website resulting in hierarchical user sessions, initiating an edit distance metrics to determine similarity in the hierarchical user sessions, and clustering similar hierarchical user sessions into groups.

    摘要翻译: 在一个实施例中,网站的挖掘包括获取网站的用户会话的网页使用数据,其中网站具有具有粒度级别的层次结构,并具有从网站的每个网页到层次结构的映射,将用户会话映射到分级结构 导致分层用户会话的网站的结构,发起编辑距离度量以确定分级用户会话中的相似性,以及将类似的分级用户会话聚类成组。

    METHODS AND SYSTEMS FOR MINING WEBSITES
    5.
    发明申请
    METHODS AND SYSTEMS FOR MINING WEBSITES 有权
    采矿网站的方法和系统

    公开(公告)号:US20100121850A1

    公开(公告)日:2010-05-13

    申请号:US12267778

    申请日:2008-11-10

    IPC分类号: G06F17/30 G06F7/00

    CPC分类号: G06F17/30861 G06Q30/02

    摘要: Mining of websites that in one embodiment includes obtaining web usage data of user sessions of a website, wherein the website has a hierarchical structure with granular levels and has mapping from each webpage of the website into the hierarchical structure, mapping the user sessions to the hierarchical structure of the website resulting in hierarchical user sessions, initiating an edit distance metrics to determine similarity in the hierarchical user sessions, and clustering similar hierarchical user sessions into groups.

    摘要翻译: 在一个实施例中,网站的挖掘包括获取网站的用户会话的网页使用数据,其中网站具有具有粒度级别的层次结构,并具有从网站的每个网页到层次结构的映射,将用户会话映射到分级结构 导致分层用户会话的网站的结构,发起编辑距离度量以确定分级用户会话中的相似性,以及将类似的分级用户会话聚类成组。

    SYSTEMS AND METHODS FOR FACILITATING THE GATHERING OF OPEN SOURCE INTELLIGENCE
    6.
    发明申请
    SYSTEMS AND METHODS FOR FACILITATING THE GATHERING OF OPEN SOURCE INTELLIGENCE 有权
    促进开源智能化的系统和方法

    公开(公告)号:US20130046771A1

    公开(公告)日:2013-02-21

    申请号:US13210116

    申请日:2011-08-15

    IPC分类号: G06F17/30

    摘要: Systems and methods (e.g., utilities) for use in providing automated, lightweight collection of online, open source data which may be content-based to reduce website source bias. In one aspect, a utility is disclosed for use in extracting content of interest from at least one website or other online data source (e.g., where the extracted content can be used in a subsequent search query). In other aspects, utilities are disclosed that are operable to perform various types of analyses on such extracted content and present graphical representations of such analyses on a display of a client device.

    摘要翻译: 系统和方法(例如,实用程序)用于提供基于内容的在线开源数据的自动轻量级收集,以减少网站源偏倚。 一方面,公开了一种用于从至少一个网站或其他在线数据源(例如,可以在随后的搜索查询中使用所提取的内容)提取感兴趣的内容的实用程序。 在其他方面,公开了可用于对这种提取的内容执行各种类型的分析并且在客户端设备的显示器上呈现这种分析的图形表示的实用程序。

    SYSTEMS AND METHODS FOR FACILITATING OPEN SOURCE INTELLIGENCE GATHERING
    7.
    发明申请
    SYSTEMS AND METHODS FOR FACILITATING OPEN SOURCE INTELLIGENCE GATHERING 有权
    促进开源智能化的系统和方法

    公开(公告)号:US20110225115A1

    公开(公告)日:2011-09-15

    申请号:US13045128

    申请日:2011-03-10

    IPC分类号: G06F17/30 G06N5/02

    摘要: Systems and methods (e.g., utilities) for use in providing automated, lightweight collection of online, open source data which may be content-based to reduce website source bias. In one aspect, a utility is disclosed for use in extracting content of interest from at least one website or other online data source (e.g., where the extracted content can be used in a subsequent search query). In other aspects, utilities are disclosed that are operable to perform various types of analyses on such extracted content and present graphical representations of such analyses on a display of a client device.

    摘要翻译: 系统和方法(例如,实用程序)用于提供基于内容的在线开源数据的自动轻量级收集,以减少网站源偏倚。 一方面,公开了一种用于从至少一个网站或其他在线数据源(例如,可以在随后的搜索查询中使用所提取的内容)提取感兴趣的内容的实用程序。 在其他方面,公开了可用于对这种提取的内容执行各种类型的分析并且在客户端设备的显示器上呈现这种分析的图形表示的实用程序。

    METHOD AND SYSTEM FOR MINING WEBSITES
    8.
    发明申请
    METHOD AND SYSTEM FOR MINING WEBSITES 有权
    采矿方法和系统

    公开(公告)号:US20100161785A1

    公开(公告)日:2010-06-24

    申请号:US12340935

    申请日:2008-12-22

    IPC分类号: G06F15/173

    CPC分类号: G06F17/30867 H04L67/22

    摘要: One website mining embodiment is for characterizing first time users of a website, collecting user session data of the users visiting the website and identifying first time visitors, determining features of the first time visitors utilizing the user session data, determining rules utilizing the features of the first time visitors, monitoring actions of the first time visitors on the website, updating the rules utilizing the monitored actions of the first time visitors and recommending web content utilizing the rules to the first time visitor.

    摘要翻译: 一个网站挖掘实施例用于表征网站的第一时间用户,收集访问网站的用户的用户会话数据并识别首次访问者,确定首次访问者利用用户会话数据的特征,使用所述用户会话数据的特征来确定规则 第一次访问者,首次访问网站的访问者的监控动作,使用第一次访问者的监控动作更新规则,并首次访问者使用规则推荐网页内容。

    Method and system for mining websites
    9.
    发明授权
    Method and system for mining websites 有权
    挖掘网站的方法和系统

    公开(公告)号:US09230030B2

    公开(公告)日:2016-01-05

    申请号:US12340935

    申请日:2008-12-22

    CPC分类号: G06F17/30867 H04L67/22

    摘要: One website mining embodiment is for characterizing first time users of a website, collecting user session data of the users visiting the website and identifying first time visitors, determining features of the first time visitors utilizing the user session data, determining rules utilizing the features of the first time visitors, monitoring actions of the first time visitors on the website, updating the rules utilizing the monitored actions of the first time visitors and recommending web content utilizing the rules to the first time visitor.

    摘要翻译: 一个网站挖掘实施例用于表征网站的第一时间用户,收集访问网站的用户的用户会话数据并识别首次访问者,确定首次访问者利用用户会话数据的特征,使用所述用户会话数据的特征来确定规则 第一次访问者,首次访问网站的访问者的监控动作,使用第一次访问者的监控动作更新规则,并首次访问者使用规则推荐网页内容。

    Systems and methods for facilitating the gathering of open source intelligence
    10.
    发明授权
    Systems and methods for facilitating the gathering of open source intelligence 有权
    促进收集开源智能的系统和方法

    公开(公告)号:US08650198B2

    公开(公告)日:2014-02-11

    申请号:US13210116

    申请日:2011-08-15

    IPC分类号: G06F17/30

    摘要: Systems and methods (e.g., utilities) for use in providing automated, lightweight collection of online, open source data which may be content-based to reduce website source bias. In one aspect, a utility is disclosed for use in extracting content of interest from at least one website or other online data source (e.g., where the extracted content can be used in a subsequent search query). In other aspects, utilities are disclosed that are operable to perform various types of analyses on such extracted content and present graphical representations of such analyses on a display of a client device.

    摘要翻译: 系统和方法(例如,实用程序)用于提供基于内容的在线开源数据的自动轻量级收集,以减少网站源偏倚。 一方面,公开了一种用于从至少一个网站或其他在线数据源(例如,可以在随后的搜索查询中使用所提取的内容)提取感兴趣的内容的实用程序。 在其他方面,公开了可用于对这种提取的内容执行各种类型的分析并且在客户端设备的显示器上呈现这种分析的图形表示的实用程序。