Grouping of documents that contain markup language code
    1.
    发明授权
    Grouping of documents that contain markup language code 有权
    分组包含标记语言代码的文档

    公开(公告)号:US07788576B1

    公开(公告)日:2010-08-31

    申请号:US11542820

    申请日:2006-10-04

    IPC分类号: G06F17/00 G06F3/00

    摘要: In one embodiment, a fingerprint is generated for each document (e.g., e-mail, web page) containing markup language (e.g., HTML) code. The fingerprint is indicative of the structure of the markup language code in the document. The fingerprint may be formed by extracting markup language tags from the document and then linking together the extracted tags to form a single string. The fingerprint may be hashed through a hashing function to generate a signature key that may be used to create a directory for the document and other documents having the same fingerprint. The grouping of documents with the same fingerprint facilitates creation of anti-spam rules or identification of web pages from particular websites, for example.

    摘要翻译: 在一个实施例中,为包含标记语言(例如,HTML)代码的每个文档(例如,电子邮件,网页)生成指纹。 该指纹指示文档中标记语言代码的结构。 可以通过从文档提取标记语言标签然后将提取的标签链接在一起以形成单个字符串来形成指纹。 可以通过散列函数对指纹进行散列,以生成签名密钥,该签名密钥可用于为文档和具有相同指纹的其他文档创建目录。 具有相同指纹的文档分组有助于创建反垃圾邮件规则或从特定网站识别网页。