On-the-fly pattern recognition with configurable bounds
    1.
    发明授权
    On-the-fly pattern recognition with configurable bounds 有权
    具有可配置边界的动态模式识别

    公开(公告)号:US08370374B1

    公开(公告)日:2013-02-05

    申请号:US13196480

    申请日:2011-08-02

    IPC分类号: G06F7/00 G06F17/30

    摘要: Some embodiments of on-the-fly pattern recognition with configurable bounds have been presented. In one embodiment, a pattern matching engine is configured based on user input, which may include values of one or more user configurable bounds on searching. Then the configured pattern matching engine is used to search for a set of features in an incoming string. A set of scores is updated based on the presence of any of the features in the string while searching for the features. Each score may indicate a likelihood of the content of the string being in a category. The search is terminated if the end of the string is reached or if the user configurable bounds are met. After terminating the search, the scores are output.

    摘要翻译: 已经提出了具有可配置界限的动态模式识别的一些实施例。 在一个实施例中,模式匹配引擎被配置为基于用户输入,其可以包括搜索上的一个或多个用户可配置边界的值。 然后,配置的模式匹配引擎用于搜索传入字符串中的一组要素。 基于在搜索特征时字符串中的任何特征的存在来更新一组分数。 每个分数可以指示字符串的内容在类别中的可能性。 如果达到字符串的结尾或满足用户可配置的界限,则搜索终止。 结束搜索后,输出得分。

    Efficient string search
    2.
    发明授权
    Efficient string search 有权
    高效的字符串搜索

    公开(公告)号:US08086441B1

    公开(公告)日:2011-12-27

    申请号:US11881556

    申请日:2007-07-27

    IPC分类号: G06F17/28

    摘要: Some embodiments of an efficient string search have been presented. In one embodiment, a string of bytes representing content written in a non-delimited language is received, wherein the content has been classified into a predetermined category. In a single pass through the string of bytes, a set of N-grams is searched for simultaneously. Statistical information on occurrences of the N-grams, if any, in the string of bytes is collected. In some embodiments, a model is generated based on the statistical information, where the model is usable by a content filter to classify content.

    摘要翻译: 已经提出了有效的字符串搜索的一些实施例。 在一个实施例中,接收表示以非分隔语言编写的内容的字节串,其中内容已被分类为预定类别。 在通过字符串的单次传递中,同时搜索一组N-gram。 收集字节串中N-gram出现的统计信息(如果有的话)。 在一些实施例中,基于统计信息生成模型,其中模型可由内容过滤器用于对内容进行分类。

    Training procedure for N-gram-based statistical content classification
    3.
    发明授权
    Training procedure for N-gram-based statistical content classification 有权
    基于N-gram的统计内容分类的训练程序

    公开(公告)号:US07792846B1

    公开(公告)日:2010-09-07

    申请号:US11881770

    申请日:2007-07-27

    IPC分类号: G06F7/00 G06F17/30

    CPC分类号: G06F17/30705

    摘要: A training procedure for N-gram based statistical document classification has been disclosed. In one embodiment, a set of N-grams is selected out of a second set of N-grams, each of the N-grams having a sequence of N bytes, where N is an integer. Then a statistical content classification model is generated based on occurrences of the N-grams, if any, in a set of training documents and a set of validation documents. The statistical content classification model is provided to content filters to classify content.

    摘要翻译: 已经公开了基于N-gram的统计文件分类的训练程序。 在一个实施例中,从第二组N-gram中选出一组N克,每个N克具有N个字节的序列,其中N是整数。 然后,根据一组训练文件和一组验证文件中的N-gram的出现(如果有的话)生成统计内容分类模型。 统计内容分类模型提供给内容过滤器以对内容进行分类。

    Efficient string search
    4.
    发明授权
    Efficient string search 有权
    高效的字符串搜索

    公开(公告)号:US08577669B1

    公开(公告)日:2013-11-05

    申请号:US13335743

    申请日:2011-12-22

    IPC分类号: G06F17/28

    摘要: Some embodiments of an efficient string search have been presented. In one embodiment, a string of bytes representing content written in a non-delimited language is received, wherein the content has been classified into a predetermined category. In a single pass through the string of bytes, a set of N-grams is searched for simultaneously. Statistical information on occurrences of the N-grams, if any, in the string of bytes is collected. In some embodiments, a model is generated based on the statistical information, where the model is usable by a content filter to classify content.

    摘要翻译: 已经提出了有效的字符串搜索的一些实施例。 在一个实施例中,接收表示以非分隔语言编写的内容的字节串,其中内容已被分类为预定类别。 在通过字符串的单次传递中,同时搜索一组N-gram。 收集字节串中N-gram出现的统计信息(如果有的话)。 在一些实施例中,基于统计信息生成模型,其中模型可由内容过滤器用于对内容进行分类。

    Training procedure for N-gram-based statistical content classification
    5.
    发明授权
    Training procedure for N-gram-based statistical content classification 有权
    基于N-gram的统计内容分类的训练程序

    公开(公告)号:US07917522B1

    公开(公告)日:2011-03-29

    申请号:US12822439

    申请日:2010-06-24

    IPC分类号: G06F7/00 G06F17/30

    CPC分类号: G06F17/30705

    摘要: A training procedure for N-gram based statistical document classification has been disclosed. In one embodiment, a set of N-grams is selected out of a second set of N-grams, each of the N-grams having a sequence of N bytes, where N is an integer. Then a statistical content classification model is generated based on occurrences of the N-grams, if any, in a set of training documents and a set of validation documents. The statistical content classification model is provided to content filters to classify content.

    摘要翻译: 已经公开了基于N-gram的统计文件分类的训练程序。 在一个实施例中,从第二组N-gram中选出一组N克,每个N克具有N个字节的序列,其中N是整数。 然后,根据一组训练文件和一组验证文件中的N-gram的出现(如果有的话)生成统计内容分类模型。 统计内容分类模型提供给内容过滤器以对内容进行分类。

    On-the-fly pattern recognition with configurable bounds
    7.
    发明授权
    On-the-fly pattern recognition with configurable bounds 有权
    具有可配置边界的动态模式识别

    公开(公告)号:US07996415B1

    公开(公告)日:2011-08-09

    申请号:US12846102

    申请日:2010-07-29

    IPC分类号: G06F7/00 G06F17/30

    摘要: Some embodiments of on-the-fly pattern recognition with configurable bounds have been presented. In one embodiment, a pattern matching engine is configured based on user input, which may include values of one or more user configurable bounds on searching. Then the configured pattern matching engine is used to search for a set of features in an incoming string. A set of scores is updated based on the presence of any of the features in the string while searching for the features. Each score may indicate a likelihood of the content of the string being in a category. The search is terminated if the end of the string is reached or if the user configurable bounds are met. After terminating the search, the scores are output.

    摘要翻译: 已经提出了具有可配置界限的动态模式识别的一些实施例。 在一个实施例中,模式匹配引擎被配置为基于用户输入,其可以包括搜索上的一个或多个用户可配置边界的值。 然后,配置的模式匹配引擎用于搜索传入字符串中的一组要素。 基于在搜索特征时字符串中的任何特征的存在来更新一组分数。 每个分数可以指示字符串的内容在类别中的可能性。 如果达到字符串的结尾或满足用户可配置的界限,则搜索终止。 结束搜索后,输出得分。

    On-the-fly pattern recognition with configurable bounds
    8.
    发明授权
    On-the-fly pattern recognition with configurable bounds 有权
    具有可配置边界的动态模式识别

    公开(公告)号:US07792850B1

    公开(公告)日:2010-09-07

    申请号:US11881530

    申请日:2007-07-27

    IPC分类号: G06F7/00 G06F17/30

    摘要: Some embodiments of on-the-fly pattern recognition with configurable bounds have been presented. In one embodiment, a pattern matching engine is configured based on user input, which may include values of one or more user configurable bounds on searching. Then the configured pattern matching engine is used to search for a set of features in an incoming string. A set of scores is updated based on the presence of any of the features in the string while searching for the features. Each score may indicate a likelihood of the content of the string being in a category. The search is terminated if the end of the string is reached or if the user configurable bounds are met. After terminating the search, the scores are output.

    摘要翻译: 已经提出了具有可配置界限的动态模式识别的一些实施例。 在一个实施例中,模式匹配引擎被配置为基于用户输入,其可以包括搜索上的一个或多个用户可配置边界的值。 然后,配置的模式匹配引擎用于搜索传入字符串中的一组要素。 基于在搜索特征时字符串中的任何特征的存在来更新一组分数。 每个分数可以指示字符串的内容在类别中的可能性。 如果达到字符串的结尾或满足用户可配置的界限,则搜索终止。 结束搜索后,输出得分。

    Link-based content ratings of pages
    9.
    发明授权
    Link-based content ratings of pages 失效
    基于链接的页面内容分级

    公开(公告)号:US07739253B1

    公开(公告)日:2010-06-15

    申请号:US11112505

    申请日:2005-04-21

    IPC分类号: G06F7/00 G06F17/30

    CPC分类号: G06F17/30867

    摘要: Methods and apparatuses for link-based content ratings for pages are described herein. According to one embodiment, statistics for each of multiple pages is determined with respect to one or more predetermined categories based on the content rating of each of the pages. For each of the categories, a set of primary pages having relationships (e.g., links) with one or more secondary pages is selected, where the selected pages probabilistically distinguish from relationships with other pages. Other methods and apparatuses are also described.

    摘要翻译: 本文描述了用于页面的基于链接的内容分级的方法和装置。 根据一个实施例,基于每个页面的内容分级,针对一个或多个预定类别来确定多个页面中的每一个的统计信息。 对于每个类别,选择具有与一个或多个次要页面的关系(例如,链接)的一组主页面,其中所选择的页面概率地区别于与其他页面的关系。 还描述了其它方法和装置。

    Method and apparatus for identifying data patterns in a file
    10.
    发明授权
    Method and apparatus for identifying data patterns in a file 有权
    用于识别文件中的数据模式的方法和装置

    公开(公告)号:US07835361B1

    公开(公告)日:2010-11-16

    申请号:US11112252

    申请日:2005-04-21

    摘要: A method and apparatus for identifying data patterns of a file are described herein. In one embodiment, an exemplary process includes, but is not limited to, receiving a data packet of a data stream containing a file segment of a file originated from an external host and destined to a protected host of a local area network (LAN), the file being transmitted via multiple file segments contained in multiple data packets of the data stream, and performing a data pattern analysis on the received data packet to determine whether the received data packet contains a predetermined data pattern, without waiting for a remainder of the data stream to arrive. Other methods and apparatuses are also described.

    摘要翻译: 本文描述了用于识别文件的数据模式的方法和装置。 在一个实施例中,示例性过程包括但不限于接收包含源自外部主机并发往局域网(LAN)的受保护主机的文件的文件段的数据流的数据分组, 所述文件通过包含在所述数据流的多个数据分组中的多个文件段进行传输,并且对所接收的数据分组执行数据模式分析,以确定所接收的数据分组是否包含预定的数据模式,而不等待剩余的数据 流到达。 还描述了其它方法和装置。