Citation record extraction system and method
    1.
    发明授权
    Citation record extraction system and method 有权
    引文记录提取系统及方法

    公开(公告)号:US08429520B2

    公开(公告)日:2013-04-23

    申请号:US12834757

    申请日:2010-07-12

    CPC classification number: G06F17/2241

    Abstract: A citation record extraction system is provided for extracting citation records from publication list pages having different layouts and contents. An HTML rendering engine receives a publication list web page, parses the publication list web page to obtain layout information of the web page. A web page sequence builder generates a web page characteristic sequence for the web page according to the layout information. A web page repeated pattern analyzer analyzes repeated patterns presented in the web page characteristic sequence, screens out non-citation records therefrom, and obtains a citation record of the publication list web page.

    Abstract translation: 提供引文记录提取系统,用于从具有不同布局和内容的出版物列表页面中提取引文记录。 HTML呈现引擎接收发布列表网页,解析发布列表网页以获取网页的布局信息。 网页序列构建器根据布局信息生成网页的网页特征序列。 网页重复模式分析器分析网页特征序列中呈现的重复模式,从其中屏蔽非引用记录,并获得发布列表网页的引用记录。

    CITATION RECORD EXTRACTION SYSTEM AND METHOD, AND PROGRAM PRODUCT
    2.
    发明申请
    CITATION RECORD EXTRACTION SYSTEM AND METHOD, AND PROGRAM PRODUCT 有权
    引用记录提取系统和方法以及程序产品

    公开(公告)号:US20110029528A1

    公开(公告)日:2011-02-03

    申请号:US12834757

    申请日:2010-07-12

    CPC classification number: G06F17/2241

    Abstract: A citation record extraction system is provided. An HTML rendering engine receives a publication list web page, parses the publication list web page to obtain layout information of the web page. A web page sequence builder generates a web page characteristic sequence for the web page according to the layout information. A web page repeated pattern analyzer analyzes repeated pattern presented in the web page characteristic sequence, screens out non-citation record therefrom, and obtains a citation record of the publication list web page.

    Abstract translation: 提供引文记录提取系统。 HTML呈现引擎接收发布列表网页,解析发布列表网页以获取网页的布局信息。 网页序列构建器根据布局信息生成网页的网页特征序列。 网页重复模式分析器分析网页特征序列中呈现的重复模式,从其中屏蔽非引用记录,并获得发布列表网页的引用记录。

Patent Agency Ranking