• 专利标题: System and method for identifying and categorizing messages extracted from archived message stores
  • 申请号: US11410370
    申请日: 2006-04-24
  • 公开(公告)号: US20060190493A1
    公开(公告)日: 2006-08-24
  • 发明人: Kenji KawaiDavid McDonald
  • 申请人: Kenji KawaiDavid McDonald
  • 主分类号: G06F17/00
  • IPC分类号: G06F17/00
System and method for identifying and categorizing messages extracted from archived message stores
摘要:
A system and method for identifying messages in a message store is provided. At least part of metadata associated with and at least part of content contained in each of a plurality of messages in a message store are encoded by generating a metadata sequence and a content sequence for each message. The messages are grouped into sets by similar metadata sequences and similar content sequences. The messages in each set are compared. Each such message not matching any other such message in the set is marked as a unique message. Each such message matching at least one other such message in the set is marked as an exact duplicate message. Each such message including a subset of at least one other such message in the set is marked as a near duplicate message.
信息查询
0/0