发明授权
US07792846B1 Training procedure for N-gram-based statistical content classification
有权
基于N-gram的统计内容分类的训练程序
- 专利标题: Training procedure for N-gram-based statistical content classification
- 专利标题(中): 基于N-gram的统计内容分类的训练程序
-
申请号: US11881770申请日: 2007-07-27
-
公开(公告)号: US07792846B1公开(公告)日: 2010-09-07
- 发明人: Thomas E. Raffill , Shunhui Zhu , Roman Yanovsky , Boris Yanovsky , John Gmuender
- 申请人: Thomas E. Raffill , Shunhui Zhu , Roman Yanovsky , Boris Yanovsky , John Gmuender
- 申请人地址: US CA San Jose
- 专利权人: SonicWall, Inc.
- 当前专利权人: SonicWall, Inc.
- 当前专利权人地址: US CA San Jose
- 代理机构: Blakely, Sokoloff, Taylor & Zafman LLP
- 主分类号: G06F7/00
- IPC分类号: G06F7/00 ; G06F17/30
摘要:
A training procedure for N-gram based statistical document classification has been disclosed. In one embodiment, a set of N-grams is selected out of a second set of N-grams, each of the N-grams having a sequence of N bytes, where N is an integer. Then a statistical content classification model is generated based on occurrences of the N-grams, if any, in a set of training documents and a set of validation documents. The statistical content classification model is provided to content filters to classify content.
信息查询