Automatic product categorization
    1.
    发明授权
    Automatic product categorization 有权
    自动产品分类

    公开(公告)号:US07870039B1

    公开(公告)日:2011-01-11

    申请号:US10920588

    申请日:2004-08-17

    IPC分类号: G06F17/30

    摘要: Techniques are provided for automatic product categorization. In one aspect, the categorization is based on text and one or more other values associated with a product offering. In another aspect, a first categorization of a product offering is performed and, if the product category chosen is in a set of co-refinable product categories, then a second (or third, fourth and so on) categorization is performed among the set of co-refinable product categories. In a third aspect, products are categorized based on cost. In another aspect, after products are categorized, the products are flagged for further categorization processing if the cost for categorizing the product is beyond a predefined threshold.

    摘要翻译: 提供了自动产品分类技术。 在一个方面,分类基于文本和与产品提供相关联的一个或多个其它值。 在另一方面,执行产品提供的第一分类,并且如果所选择的产品类别是一组可共同制造的产品类别,则在该组合中执行第二(或第三,第四等等)分类 可共同的产品类别。 在第三方面,产品根据成本进行分类。 另一方面,在分类产品之后,如果产品分类的成本超出预定义的阈值,则将产品标记为进一步的分类处理。

    System and method for focussed web crawling
    2.
    发明授权
    System and method for focussed web crawling 失效
    集中网络爬行的系统和方法

    公开(公告)号:US06418433B1

    公开(公告)日:2002-07-09

    申请号:US09239921

    申请日:1999-01-28

    IPC分类号: G06F1730

    摘要: A focussed Web crawler learns to recognize Web pages that are relevant to the interest of one or more users, from a set of examples provided by the users. It then explores the Web starting from the example set, using the statistics collected from the examples and other analysis on the link graph of the growing crawl database, to guide itself towards relevant, valuable resources and away from irrelevant and/or low quality material on the Web. Thereby, the Web crawler builds a comprehensive topic-specific library for the benefit of specific users.

    摘要翻译: 集中的Web爬虫从一组由用户提供的示例中学习识别与一个或多个用户的兴趣相关的网页。 然后,从示例集开始,使用从示例中收集的统计信息和不断增长的爬网数据库的链接图上的其他分析来探索Web,以引导相关的有价值的资源,并远离无关和/或低质量的材料 网络。 因此,Web爬行器构建了一个综合的主题专用库,以便为特定用户带来利益。

    Method for interactively creating an information database including preferred information elements, such as preferred-authority, world wide web pages
    4.
    发明授权
    Method for interactively creating an information database including preferred information elements, such as preferred-authority, world wide web pages 有权
    用于交互地创建包括优选信息元素的信息数据库的方法,诸如首选权限,万维网页面

    公开(公告)号:US06356899B1

    公开(公告)日:2002-03-12

    申请号:US09261926

    申请日:1999-03-03

    IPC分类号: G06F1730

    摘要: A method for identifying, filtering, ranking and cataloging information elements; as for example, World Wide Web pages, of the Internet in whole, part, or in combination. The method is preferably implemented in computer software and features steps for enabling a user to interactively create an information database including preferred information elements such as preferred World Wide Web pages in whole, part, or in combination. The method includes steps for enabling a user to interactively create a frame-based, hierarchical organizational structure for the information elements, and steps for identifying and automatically filtering and ranking by relevance, information elements, such as World Wide Web pages for populating the structure, to form; for example, a searchable, World Wide Web page database. Additionally, the method features steps for enabling a user to interactively define a frame-based, hierarchical information structure for cataloging information, identifying a preliminary population of information elements for a particular hierarchical category arranged as a frame, based upon the respective frame attributes, and thereafter, expanding the information population to include related information, and subsequently, automatically filtering and ranking the information based upon relevance, and then populating the hierarchical structure with the a definable portion of the filtered, ranked information elements.

    摘要翻译: 用于识别,过滤,排序和编目信息元素的方法; 例如,互联网的万维网页面,全部,部分或组合。 该方法优选地在计算机软件中实现,并且具有使用户能够以整体,部分或组合方式交互地创建包括优选信息元素(例如优选的万维网页)的信息数据库的步骤。 该方法包括使用户能够交互地创建用于信息元素的基于帧的分级组织结构的步骤,以及用于识别和自动过滤和排序相关性的步骤,用于填充结构的诸如万维网页面的信息元素, 来形成; 例如,可搜索的万维网页数据库。 另外,该方法具有以下步骤:使得用户能够交互地定义用于编目信息的基于帧的分层信息结构,基于相应的帧属性来识别为排列为帧的特定分级类别的信息元素的初步总体,以及 此后,扩展信息群体以包括相关信息,并随后基于相关性自动过滤和排序信息,然后使用经过滤的排名信息元素的可定义部分来填充分层结构。

    Method for cataloging, filtering, and relevance ranking frame-based hierarchical information structures
    5.
    发明授权
    Method for cataloging, filtering, and relevance ranking frame-based hierarchical information structures 有权
    编目,过滤和相关性排序基于帧的层次信息结构的方法

    公开(公告)号:US06334131B2

    公开(公告)日:2001-12-25

    申请号:US09143733

    申请日:1998-08-29

    IPC分类号: G06F1730

    摘要: A method for cataloging, filtering and ranking information, as for example, World Wide Web pages of the Internet. The method is preferably implemented in computer software and features steps for enabling a user to interactively create an information database including preferred information elements such as preferred-authority World Wide Web pages. The method includes steps for enabling a user to interactively create a frame-based, hierarchical organizational structure for the information elements, and steps for identifying and automatically filtering and ranking by relevance, information elements, such as World Wide Web pages for populating the structure, to form, for example, a searchable, World Wide Web page database. Additionally, the method features steps for enabling a user to interactively define a frame-based, hierarchical information structure for cataloging information, identifying a preliminary population of information elements for a particular hierarchical category arranged as a frame, based upon the respective frame attributes, and thereafter, expanding the information population to include related information, and subsequently, automatically filtering and ranking the information based upon relevance, and then populating the hierarchical structure with a definable portion of the filtered, ranked information elements.

    摘要翻译: 用于对信息进行编目,过滤和排序的方法,例如互联网的万维网页面。 该方法优选地在计算机软件中实现,并且特征步骤用于使得用户能够交互地创建包括诸如优选权威万维网页面之类的优选信息元素的信息数据库。 该方法包括使用户能够交互地创建用于信息元素的基于帧的分层组织结构的步骤,以及用于识别和自动过滤和排序相关性的步骤,诸如用于填充结构的万维网页面的信息元素, 以形成例如可搜索的万维网页数据库。 另外,该方法具有以下步骤:使得用户能够交互地定义用于编目信息的基于帧的分层信息结构,基于相应的帧属性来识别为排列为帧的特定分级类别的信息元素的初步总体,以及 此后,扩展信息群体以包括相关信息,随后基于相关性自动过滤和排序信息,然后用经过排序的信息元素的可定义部分填充分层结构。

    Feature diffusion across hyperlinks
    7.
    发明授权
    Feature diffusion across hyperlinks 失效
    跨越超链接的功能扩散

    公开(公告)号:US6125361A

    公开(公告)日:2000-09-26

    申请号:US58635

    申请日:1998-04-10

    IPC分类号: G06F17/30

    摘要: A system and method for ranking wide area computer network (e.g., Web) pages by popularity in response to a query. Further, using a query and the response thereto from a search engine, the system and method finds additional key words that might be good extended search terms, essentially generating a local thesaurus on the fly at query time.

    摘要翻译: 一种用于响应于查询通过普及来对广域计算机网络(例如,Web)页进行排名的系统和方法。 此外,使用查询和来自搜索引擎的响应,系统和方法找到可能是良好的扩展搜索术语的附加关键词,基本上在查询时间上生成本地辞典。

    System and method for determining web page quality using collective inference based on local and global information
    8.
    发明授权
    System and method for determining web page quality using collective inference based on local and global information 有权
    基于本地和全球信息的集体推理来确定网页质量的系统和方法

    公开(公告)号:US07809705B2

    公开(公告)日:2010-10-05

    申请号:US11706025

    申请日:2007-02-13

    IPC分类号: G06F7/00 G06F17/30

    CPC分类号: G06F17/30873

    摘要: An improved system and method is provided for determining web page quality using collective inference based on local and global web page information. A classification engine may be provided for classifying a web page using local features of a seed set of web pages and global web graph information about the seed set of web pages. A dual algorithm based on graph regularization formulated as a well-formed optimization solution may be used in an embodiment for applying collective inference for binary classification of the web page using the local web page information and global web graph information of a web page, the local web page information and global web graph information of an authoritative set of web pages, and the local web page information and global web graph information of a non-authoritative set of web pages.

    摘要翻译: 提供了一种改进的系统和方法,用于基于本地和全球网页信息使用集体推理来确定网页质量。 可以提供分类引擎,用于使用网页的种子集的局部特征和关于网页的种子集合的全局网络图信息对网页进行分类。 在一个实施例中,可以使用基于图形正则化的双精度算法,该算法可以用于使用本地网页信息和网页的全局网络图信息来应用集体推理网页的二进制分类的实施例 网页信息和权威的网页集的全球网络图信息,以及非权威性网页集的本地网页信息和全球网络图信息。

    Method and system for filtering of information entities
    9.
    发明授权
    Method and system for filtering of information entities 失效
    信息实体过滤方法和系统

    公开(公告)号:US06996572B1

    公开(公告)日:2006-02-07

    申请号:US08947221

    申请日:1997-10-08

    IPC分类号: G06F17/00

    摘要: A system and method are provided for eliciting interesting structure from a collection of entities or resources with explicit and/or implicit, static and/or dynamic relations, called “affinities,” between them. Interesting structure includes (1) notions of quality, authority, or definitiveness of information, (2) notions of relevance to a user's information need, (3) notions of similarity among the plurality of resources retrieved from a universe of resources by a query process, and (4) notions of similarity among the usages of resources by different users/servers. Similarities between entities are computed, based on similarities between the affinity values for the entities. That is, where the affinitiy values for two entities resemble each other, the two entities have a high degree of similarity. Using the similarities, the entities are ranked, clustered, etc., based on a significance derived from the similarities. The ranking, clustering, etc., makes up the interesting structure which is sought.

    摘要翻译: 提供了一种系统和方法,用于从具有明确和/或隐含,静态和/或动态关系的实体或资源集合中引出有趣的结构,在它们之间称为“亲和度”。 有趣的结构包括(1)信息的质量,权威或定义的概念,(2)与用户信息需求相关的概念,(3)通过查询过程从资源范围检索的多个资源之间的相似度概念 ,(4)不同用户/服务器资源使用情况之间的相似性概念。 基于实体的亲和度值之间的相似度来计算实体之间的相似性。 也就是说,两个实体的亲属价值相似,两个实体的相似度很高。 使用相似之处,实体根据从相似性导出的意义进行排名,聚类等。 排名,聚类等构成了有趣的结构。

    System and method for boosting support vector machines
    10.
    发明授权
    System and method for boosting support vector machines 有权
    用于增强支持向量机的系统和方法

    公开(公告)号:US06662170B1

    公开(公告)日:2003-12-09

    申请号:US09643590

    申请日:2000-08-22

    IPC分类号: G06N502

    CPC分类号: G06K9/6269 G06K9/6256

    摘要: A system and method for training an SVM in a scalable manner includes boosting the SVM during training. Specifically, individual SVMs in an ensemble of SVMs are trained using small subsets of a training data set, with data that earlier classifiers in the ensemble incorrectly classified being overrepresented in succeeding subsets. In this way, the speed with which the overall SVM is trained is increased and the memory requirements therefor are reduced, even for relatively large training data sets.

    摘要翻译: 用于以可扩展方式训练SVM的系统和方法包括在训练期间增强SVM。 具体来说,使用训练数据集的小子集来训练SVM集合中的各个SVM,其中不正确分类的组合中的早期分类器在后续子集中被过度表示的数据被训练。 这样,即使对于相对较大的训练数据集,训练总体SVM的速度增加,并且其存储器要求降低。