UTILIZATION OF FEATURES EXTRACTED FROM STRUCTURED DOCUMENTS TO IMPROVE SEARCH RELEVANCE
    1.
    发明申请
    UTILIZATION OF FEATURES EXTRACTED FROM STRUCTURED DOCUMENTS TO IMPROVE SEARCH RELEVANCE 有权
    从结构化文档中提取的特征的使用,以提高搜索关联度

    公开(公告)号:US20130031032A1

    公开(公告)日:2013-01-31

    申请号:US13191486

    申请日:2011-07-27

    IPC分类号: G06F17/30 G06F15/18

    摘要: Features automatically extracted from semi-structured web pages are utilized by a search engine to rank documents that include semi-structured web pages. These features include, but are not limited to, a number of reviews, a number of positive reviews, and/or a number of negative reviews from a web page that includes user reviews. These features also include a number of views of a video that is viewable by way of a semi-structured web page. The features also include a number of subscribers to broadcasts of an individual from a social networking web page and a number of contacts of an individual listed on a social networking web page.

    摘要翻译: 由搜索引擎利用从半结构化网页自动提取的特征来对包含半结构化网页的文档进行排序。 这些功能包括但不限于许多评论,一些积极的评论,和/或包括用户评论在内的一些网页的评论。 这些功能还包括可通过半结构化网页查看的视频的多个视图。 这些功能还包括许多用户从社交网络网页广播个人,以及在社交网络网页上列出的个人的多个联系人。

    METHOD AND SYSTEM FOR WEB INFORMATION EXTRACTION
    2.
    发明申请
    METHOD AND SYSTEM FOR WEB INFORMATION EXTRACTION 有权
    网络信息抽取方法与系统

    公开(公告)号:US20120084636A1

    公开(公告)日:2012-04-05

    申请号:US12896942

    申请日:2010-10-04

    IPC分类号: G06F17/00

    摘要: An example of a method includes determining features of a first type for a web page of a plurality of web pages. The method also includes electronically determining a plurality of rules for an attribute of the first web page, wherein the plurality of rules are determined based on features of the first type. The method also includes electronically identifying a first rule, from the plurality of rules, which satisfies a first predefined criterion. The first predefined criteria include at least one of a first threshold for a precision parameter, a second threshold for a support parameter, a third threshold for a distance parameter and a fourth threshold for a recall parameter. The method further includes storing the first rule to enable extraction of value of the attribute from a second web page.

    摘要翻译: 一种方法的示例包括确定多个网页中的网页的第一类型的特征。 该方法还包括电子地确定用于第一网页的属性的多个规则,其中基于第一类型的特征来确定多个规则。 该方法还包括从满足第一预定准则的多个规则中电子地识别第一规则。 第一预定准则包括精度参数的第一阈值,支持参数的第二阈值,距离参数的第三阈值和召回参数的第四阈值中的至少一个。 该方法还包括存储第一规则以便能够从第二网页提取属性的值。

    Utilization of features extracted from structured documents to improve search relevance
    4.
    发明授权
    Utilization of features extracted from structured documents to improve search relevance 有权
    利用从结构化文档中提取的特征来提高搜索的相关性

    公开(公告)号:US08788436B2

    公开(公告)日:2014-07-22

    申请号:US13191486

    申请日:2011-07-27

    IPC分类号: G06K9/32 G06F17/30

    摘要: Features automatically extracted from semi-structured web pages are utilized by a search engine to rank documents that include semi-structured web pages. These features include, but are not limited to, a number of reviews, a number of positive reviews, and/or a number of negative reviews from a web page that includes user reviews. These features also include a number of views of a video that is viewable by way of a semi-structured web page. The features also include a number of subscribers to broadcasts of an individual from a social networking web page and a number of contacts of an individual listed on a social networking web page.

    摘要翻译: 由搜索引擎利用从半结构化网页自动提取的特征来对包含半结构化网页的文档进行排序。 这些功能包括但不限于许多评论,一些积极的评论,和/或包括用户评论在内的一些网页的评论。 这些功能还包括可通过半结构化网页查看的视频的多个视图。 这些功能还包括许多用户从社交网络网页广播个人,以及在社交网络网页上列出的个人的多个联系人。