Method and system using machine learning to automatically discover home pages on the internet
    1.
    发明授权
    Method and system using machine learning to automatically discover home pages on the internet 有权
    使用机器学习的方法和系统在互联网上自动发现主页

    公开(公告)号:US08583639B2

    公开(公告)日:2013-11-12

    申请号:US12033160

    申请日:2008-02-19

    IPC分类号: G06F7/00 G06F17/30

    CPC分类号: G06F17/30864

    摘要: A method for automatically determining an Internet home page corresponding to a named entity identified by a specified descriptor including building a trained machine-learning model, generating candidate matches from the specified descriptor, wherein each candidate match includes an Internet address, extracting content-based features from websites associated with the Internet addresses of the candidate matches, determining a model score for each candidate match based on the content-based features using the trained machine-learning model, and determining a match from among the candidate matches according to the scores, wherein the match is returned as the Internet home page corresponding to the named entity.

    摘要翻译: 一种用于自动确定与由指定描述符标识的命名实体相对应的因特网主页的方法,包括建立训练有素的机器学习模型,从指定的描述符生成候选匹配,其中每个候选匹配包括因特网地址,提取基于内容的特征 从与候选匹配的互联网地址相关联的网站,基于使用训练机器学习模型的基于内容的特征来确定每个候选匹配的模型分数,以及根据分数从候选匹配中确定匹配,其中 该匹配将作为与该命名实体相对应的因特网主页返回。

    METHOD AND SYSTEM USING MACHINE LEARNING TO AUTOMATICALLY DISCOVER HOME PAGES ON THE INTERNET
    2.
    发明申请
    METHOD AND SYSTEM USING MACHINE LEARNING TO AUTOMATICALLY DISCOVER HOME PAGES ON THE INTERNET 有权
    使用机器学习的方法和系统在互联网上自动发现家庭页面

    公开(公告)号:US20090210419A1

    公开(公告)日:2009-08-20

    申请号:US12033160

    申请日:2008-02-19

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30864

    摘要: A method for automatically determining an Internet home page corresponding to a named entity identified by a specified descriptor including building a trained machine-learning model, generating candidate matches from the specified descriptor, wherein each candidate match includes an Internet address, extracting content-based features from websites associated with the Internet addresses of the candidate matches, determining a model score for each candidate match based on the content-based features using the trained machine-learning model, and determining a match from among the candidate matches according to the scores, wherein the match is returned as the Internet home page corresponding to the named entity.

    摘要翻译: 一种用于自动确定与由指定描述符标识的命名实体相对应的因特网主页的方法,包括建立训练有素的机器学习模型,从指定的描述符生成候选匹配,其中每个候选匹配包括因特网地址,提取基于内容的特征 从与候选匹配的互联网地址相关联的网站,基于使用训练机器学习模型的基于内容的特征来确定每个候选匹配的模型分数,以及根据分数从候选匹配中确定匹配,其中 该匹配将作为与该命名实体相对应的因特网主页返回。

    METHOD AND SYSTEM FOR IDENTIFYING COMPANIES WITH SPECIFIC BUSINESS OBJECTIVES
    5.
    发明申请
    METHOD AND SYSTEM FOR IDENTIFYING COMPANIES WITH SPECIFIC BUSINESS OBJECTIVES 有权
    用于识别具有特定业务目标的公司的方法和系统

    公开(公告)号:US20090204569A1

    公开(公告)日:2009-08-13

    申请号:US12028877

    申请日:2008-02-11

    IPC分类号: G06F17/30 G06F17/00

    CPC分类号: G06F17/30864

    摘要: A method for identifying companies with specific business objectives that includes using existing sources of company firmographic data to identify a broad set of companies and associated websites, crawling the websites associated with the identified companies and indexing web site content for each of the identified companies with the specific business objective to realize indexed web content. The method further includes joining the company firmographic data with the indexed web content using a business objective common identifier to generate a store of joined structured firmographic data and indexed web content and presenting a display image representation of the store of joined structured firmographic data and indexed web content for user review. The display image further receives user input to score each of said companies identified therein, and using a search interface, querying the store of scored, joined structured firmographic data and indexed web content. The method further includes augmenting the search interface, or search results from a query, with predictive, machine-leaning processes that allow rapid identification of companies possibly missed in the query.

    摘要翻译: 一种用于识别具有特定业务目标的公司的方法,其中包括使用公司隐性数据的现有来源来识别广泛的公司和相关网站,爬行与所识别的公司相关联的网站,并为每个被识别的公司索引网站内容 具体的业务目标来实现索引的Web内容。 该方法还包括使用业务目标公共标识符将公司隐含数据与索引的网页内容相加,以生成连接的结构化地图数据和索引的网页内容的存储,以及呈现连接的结构化地图数据和索引网的存储的显示图像表示 用户评论内容。 显示图像还接收用户输入,以对其中识别的每个所述公司进行评分,并使用搜索界面,查询记分,结合的结构化数据和索引的web内容的存储。 该方法还包括利用预测性机器倾斜过程增强搜索接口或来自查询的搜索结果,其允许快速识别可能在查询中遗漏的公司。

    Method and system for identifying companies with specific business objectives
    6.
    发明授权
    Method and system for identifying companies with specific business objectives 有权
    用于识别具有特定业务目标的公司的方法和系统

    公开(公告)号:US08145619B2

    公开(公告)日:2012-03-27

    申请号:US12028877

    申请日:2008-02-11

    IPC分类号: G06F7/00 G06F17/30 G06F13/14

    CPC分类号: G06F17/30864

    摘要: A method for identifying companies with specific business objectives that includes using existing sources of company firmographic data to identify a broad set of companies and associated websites, crawling the websites associated with the identified companies and indexing web site content for each of the identified companies with the specific business objective to realize indexed web content. The method further includes joining the company firmographic data with the indexed web content using a business objective common identifier to generate a store of joined structured firmographic data and indexed web content and presenting a display image representation of the store of joined structured firmographic data and indexed web content for user review. The display image further receives user input to score each of said companies identified therein, and using a search interface, querying the store of scored, joined structured firmographic data and indexed web content. The method further includes augmenting the search interface, or search results from a query, with predictive, machine-leaning processes that allow rapid identification of companies possibly missed in the query.

    摘要翻译: 一种用于识别具有特定业务目标的公司的方法,其中包括使用公司隐性数据的现有来源来识别广泛的公司和相关网站,爬行与所识别的公司相关联的网站,并为每个被识别的公司索引网站内容 具体的业务目标来实现索引的Web内容。 该方法还包括使用业务目标公共标识符将公司隐含数据与索引的网页内容相加,以生成连接的结构化地图数据和索引的网页内容的存储,以及呈现连接的结构化地图数据和索引网的存储的显示图像表示 用户评论内容。 显示图像还接收用户输入,以对其中识别的每个所述公司进行评分,并使用搜索界面,查询记分,结合的结构化数据和索引的web内容的存储。 该方法还包括利用预测性机器倾斜过程增强搜索接口或来自查询的搜索结果,其允许快速识别可能在查询中遗漏的公司。

    INFERRING EMERGING AND EVOLVING TOPICS IN STREAMING TEXT
    7.
    发明申请
    INFERRING EMERGING AND EVOLVING TOPICS IN STREAMING TEXT 有权
    在流动文字中传播新兴和演变主题

    公开(公告)号:US20130151520A1

    公开(公告)日:2013-06-13

    申请号:US13315798

    申请日:2011-12-09

    IPC分类号: G06F17/30

    CPC分类号: G06F17/2785 G06F17/30619

    摘要: A method, system and computer program product for inferring topic evolution and emergence in a set of documents. In one embodiment, the method comprises forming a group of matrices using text in the documents, and analyzing these matrices to identify a first group of topics as evolving topics and a second group of topics as emerging topics. The matrices includes a first matrix X identifying a multitude of words in each of the documents, a second matrix W identifying a multitude of topics in each of the documents, and a third matrix H identifying a multitude of words for each of the multitude of topics. These matrices are analyzed to identify the evolving and emerging topics. In an embodiment, the documents form a streaming dataset, and two forms of temporal regularizers are used to help identify the evolving topics and the emerging topics in the streaming dataset.

    摘要翻译: 一套用于推断主题演变和出现在一组文件中的方法,系统和计算机程序产品。 在一个实施例中,该方法包括使用文档中的文本形成一组矩阵,并且分析这些矩阵以将第一组主题识别为演变主题,将第二组主题识别为新兴主题。 矩阵包括识别每个文档中的多个单词的第一矩阵X,标识每个文档中的众多主题的第二矩阵W,以及为每个主题中的每一个标识多个单词的第三矩阵H 。 对这些矩阵进行分析,以确定不断发展的新兴主题。 在一个实施例中,文档形成流数据集,并且使用两种形式的时间规则化器来帮助识别流数据集中不断发展的主题和新兴主题。

    Inferring emerging and evolving topics in streaming text
    9.
    发明授权
    Inferring emerging and evolving topics in streaming text 有权
    推动流媒体文本中新兴和不断发展的话题

    公开(公告)号:US08909643B2

    公开(公告)日:2014-12-09

    申请号:US13315798

    申请日:2011-12-09

    IPC分类号: G06F17/30

    CPC分类号: G06F17/2785 G06F17/30619

    摘要: A method, system and computer program product for inferring topic evolution and emergence in a set of documents. In one embodiment, the method comprises forming a group of matrices using text in the documents, and analyzing these matrices to identify a first group of topics as evolving topics and a second group of topics as emerging topics. The matrices includes a first matrix X identifying a multitude of words in each of the documents, a second matrix W identifying a multitude of topics in each of the documents, and a third matrix H identifying a multitude of words for each of the multitude of topics. These matrices are analyzed to identify the evolving and emerging topics. In an embodiment, the documents form a streaming dataset, and two forms of temporal regularizers are used to help identify the evolving topics and the emerging topics in the streaming dataset.

    摘要翻译: 一套用于推断主题演变和出现在一组文件中的方法,系统和计算机程序产品。 在一个实施例中,该方法包括使用文档中的文本形成一组矩阵,并且分析这些矩阵以将第一组主题识别为演变主题,将第二组主题识别为新兴主题。 矩阵包括识别每个文档中的多个单词的第一矩阵X,标识每个文档中的众多主题的第二矩阵W,以及为每个主题中的每一个标识多个单词的第三矩阵H 。 对这些矩阵进行分析,以确定不断发展的新兴主题。 在一个实施例中,文档形成流数据集,并且使用两种形式的时间规则化器来帮助识别流数据集中不断发展的主题和新兴主题。