摘要:
Some embodiments of an efficient string search have been presented. In one embodiment, a string of bytes representing content written in a non-delimited language is received, wherein the content has been classified into a predetermined category. In a single pass through the string of bytes, a set of N-grams is searched for simultaneously. Statistical information on occurrences of the N-grams, if any, in the string of bytes is collected. In some embodiments, a model is generated based on the statistical information, where the model is usable by a content filter to classify content.
摘要:
A training procedure for N-gram based statistical document classification has been disclosed. In one embodiment, a set of N-grams is selected out of a second set of N-grams, each of the N-grams having a sequence of N bytes, where N is an integer. Then a statistical content classification model is generated based on occurrences of the N-grams, if any, in a set of training documents and a set of validation documents. The statistical content classification model is provided to content filters to classify content.
摘要:
Some embodiments of on-the-fly pattern recognition with configurable bounds have been presented. In one embodiment, a pattern matching engine is configured based on user input, which may include values of one or more user configurable bounds on searching. Then the configured pattern matching engine is used to search for a set of features in an incoming string. A set of scores is updated based on the presence of any of the features in the string while searching for the features. Each score may indicate a likelihood of the content of the string being in a category. The search is terminated if the end of the string is reached or if the user configurable bounds are met. After terminating the search, the scores are output.
摘要:
Some embodiments of an efficient string search have been presented. In one embodiment, a string of bytes representing content written in a non-delimited language is received, wherein the content has been classified into a predetermined category. In a single pass through the string of bytes, a set of N-grams is searched for simultaneously. Statistical information on occurrences of the N-grams, if any, in the string of bytes is collected. In some embodiments, a model is generated based on the statistical information, where the model is usable by a content filter to classify content.
摘要:
A training procedure for N-gram based statistical document classification has been disclosed. In one embodiment, a set of N-grams is selected out of a second set of N-grams, each of the N-grams having a sequence of N bytes, where N is an integer. Then a statistical content classification model is generated based on occurrences of the N-grams, if any, in a set of training documents and a set of validation documents. The statistical content classification model is provided to content filters to classify content.
摘要:
Some embodiments of on-the-fly pattern recognition with configurable bounds have been presented. In one embodiment, a pattern matching engine is configured based on user input, which may include values of one or more user configurable bounds on searching. Then the configured pattern matching engine is used to search for a set of features in an incoming string. A set of scores is updated based on the presence of any of the features in the string while searching for the features. Each score may indicate a likelihood of the content of the string being in a category. The search is terminated if the end of the string is reached or if the user configurable bounds are met. After terminating the search, the scores are output.
摘要:
Some embodiments of on-the-fly pattern recognition with configurable bounds have been presented. In one embodiment, a pattern matching engine is configured based on user input, which may include values of one or more user configurable bounds on searching. Then the configured pattern matching engine is used to search for a set of features in an incoming string. A set of scores is updated based on the presence of any of the features in the string while searching for the features. Each score may indicate a likelihood of the content of the string being in a category. The search is terminated if the end of the string is reached or if the user configurable bounds are met. After terminating the search, the scores are output.
摘要:
Some embodiments of on-the-fly pattern recognition with configurable bounds have been presented. In one embodiment, a pattern matching engine is configured based on user input, which may include values of one or more user configurable bounds on searching. Then the configured pattern matching engine is used to search for a set of features in an incoming string. A set of scores is updated based on the presence of any of the features in the string while searching for the features. Each score may indicate a likelihood of the content of the string being in a category. The search is terminated if the end of the string is reached or if the user configurable bounds are met. After terminating the search, the scores are output.
摘要:
Method and apparatus for multimedia content filtering are described herein. In one embodiment, an example of a network access device, in response to multimedia content transmitted from a source over a first network and destined to a destination over a second network, opens the multimedia content within the network access device interfacing the first and second networks. A content rating operation is performed on the opened multimedia content to determine whether the multimedia content should be transmitted to the destination over the second network. Other methods and apparatuses are also described.
摘要:
A local gateway device receives email across the internet from a sender of the email and forwards it across the internet to an email filtering system. The email filtering system analyzes the email to determine whether it is spam, phishing or contains a virus and sends it back to the local gateway device along with the filtered determination. The local gateway device forwards the received email and the filtered determination to a local junk store which handles the email appropriately. For example, if the email has been determined to be spam, phishing or containing a virus, the junk store can quarantine the email and if the email has been determined to be non-spun and/or not phishing and/or not containing a virus, the junk store can forward the email to a local mail server for delivery.