Training procedure for N-gram-based statistical content classification

发明授权

US07792846B1 Training procedure for N-gram-based statistical content classification 有权

标题翻译：基于N-gram的统计内容分类的训练程序

请登陆查看更多内容

专利标题： Training procedure for N-gram-based statistical content classification
专利标题（中）： 基于N-gram的统计内容分类的训练程序
申请号： US11881770

申请日： 2007-07-27
公开(公告)号： US07792846B1

公开(公告)日： 2010-09-07
发明人: Thomas E. Raffill , Shunhui Zhu , Roman Yanovsky , Boris Yanovsky , John Gmuender
申请人： Thomas E. Raffill , Shunhui Zhu , Roman Yanovsky , Boris Yanovsky , John Gmuender
申请人地址： US CA San Jose
专利权人： SonicWall, Inc.
当前专利权人： SonicWall, Inc.
当前专利权人地址： US CA San Jose
代理机构： Blakely, Sokoloff, Taylor & Zafman LLP
主分类号： G06F7/00
IPC分类号： G06F7/00 ; G06F17/30

Training procedure for N-gram-based statistical content classification

摘要：

A training procedure for N-gram based statistical document classification has been disclosed. In one embodiment, a set of N-grams is selected out of a second set of N-grams, each of the N-grams having a sequence of N bytes, where N is an integer. Then a statistical content classification model is generated based on occurrences of the N-grams, if any, in a set of training documents and a set of validation documents. The statistical content classification model is provided to content filters to classify content.

摘要（中）：

已经公开了基于N-gram的统计文件分类的训练程序。在一个实施例中，从第二组N-gram中选出一组N克，每个N克具有N个字节的序列，其中N是整数。然后，根据一组训练文件和一组验证文件中的N-gram的出现（如果有的话）生成统计内容分类模型。统计内容分类模型提供给内容过滤器以对内容进行分类。

信息查询

Espacenet

IPC分类:

G	物理
G06	计算；推算或计数
G06F	电数字数据处理（基于特定计算模型的计算机系统入G06N）
G06F7/00	通过待处理的数据的指令或内容进行运算的数据处理的方法或装置（逻辑电路入H03K19/00）