-
公开(公告)号:US07788576B1
公开(公告)日:2010-08-31
申请号:US11542820
申请日:2006-10-04
申请人: Hsin-Yi Chen , Dung-Jou Tsai , Guan-Liang Chen , Cheng-Hsin Hsu
发明人: Hsin-Yi Chen , Dung-Jou Tsai , Guan-Liang Chen , Cheng-Hsin Hsu
CPC分类号: G06Q10/107 , G06F21/51 , G06F2221/2119 , H04L51/12
摘要: In one embodiment, a fingerprint is generated for each document (e.g., e-mail, web page) containing markup language (e.g., HTML) code. The fingerprint is indicative of the structure of the markup language code in the document. The fingerprint may be formed by extracting markup language tags from the document and then linking together the extracted tags to form a single string. The fingerprint may be hashed through a hashing function to generate a signature key that may be used to create a directory for the document and other documents having the same fingerprint. The grouping of documents with the same fingerprint facilitates creation of anti-spam rules or identification of web pages from particular websites, for example.
摘要翻译: 在一个实施例中,为包含标记语言(例如,HTML)代码的每个文档(例如,电子邮件,网页)生成指纹。 该指纹指示文档中标记语言代码的结构。 可以通过从文档提取标记语言标签然后将提取的标签链接在一起以形成单个字符串来形成指纹。 可以通过散列函数对指纹进行散列,以生成签名密钥,该签名密钥可用于为文档和具有相同指纹的其他文档创建目录。 具有相同指纹的文档分组有助于创建反垃圾邮件规则或从特定网站识别网页。