SYSTEM AND METHOD FOR DOWNLOADING HYPERTEXT MARKUP LANGUAGE FORMATTED WEB PAGES
    1.
    发明申请
    SYSTEM AND METHOD FOR DOWNLOADING HYPERTEXT MARKUP LANGUAGE FORMATTED WEB PAGES 失效
    用于下载超文本标记语言格式的网页的系统和方法

    公开(公告)号:US20080046449A1

    公开(公告)日:2008-02-21

    申请号:US11756593

    申请日:2007-05-31

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30902

    摘要: A method for downloading HTML formatted Web pages is provided. The method includes the steps of writing a URL of a Web page to be downloaded to an XQuery script; analyzing the XQuery script to obtain the URL of the HTML Web page and saving the downloaded Web page in a database as the local Web page; analyzing the contents of the local Web page to obtain target contents; converting the relative URLs of all image files to the absolute URLs; downloading all the image files according to the absolute URLs; replacing the absolute URLs of the image files with an local image file path; converting the relative URLs of the embedded links to the absolute URLs of the embedded links; saving all the converted absolute URLs in the database, creating identifiers; replacing the converted absolute URLs of the embedded links with an embedded link local path. A related system is also disclosed.

    摘要翻译: 提供了一种用于下载HTML格式的网页的方法。 该方法包括将要下载的网页的URL写入XQuery脚本的步骤; 分析XQuery脚本以获取HTML网页的URL,并将下载的网页作为本地网页保存在数据库中; 分析本地网页的内容以获取目标内容; 将所有图像文件的相对URL转换为绝对URL; 根据绝对URL下载所有图像文件; 用本地图像文件路径替换图像文件的绝对URL; 将嵌入式链接的相对URL转换为嵌入链接的绝对URL; 保存数据库中所有转换的绝对URL,创建标识符; 用嵌入式链接本地路径替换嵌入式链接的转换绝对URL。 还公开了相关系统。