Identification of normal scripts in computer systems
    1.
    发明授权
    Identification of normal scripts in computer systems 有权
    识别计算机系统中的正常脚本

    公开(公告)号:US08838992B1

    公开(公告)日:2014-09-16

    申请号:US13096453

    申请日:2011-04-28

    IPC分类号: G06F21/00 G06F21/56

    摘要: A machine learning model is used to identify normal scripts in a client computer. The machine learning model may be built by training using samples of known normal scripts and samples of known potentially malicious scripts and may take into account lexical and semantic characteristics of the sample scripts. The machine learning model and a feature set may be provided to the client computer by a server computer. In the client computer, the machine learning model may be used to classify a target script. The target script does not have to be evaluated for malicious content when classified as a normal script. Otherwise, when the target script is classified as a potentially malicious script, the target script may have to be further evaluated by an anti-malware or sent to a back-end system.

    摘要翻译: 机器学习模型用于识别客户端计算机中的正常脚本。 机器学习模型可以通过使用已知正常脚本的样本和已知潜在恶意脚本的样本的训练来构建,并且可以考虑示例脚本的词汇和语义特征。 机器学习模型和特征集可以由服务器计算机提供给客户端计算机。 在客户端计算机中,机器学习模型可用于对目标脚本进行分类。 当分类为普通脚本时,目标脚本不必对恶意内容进行评估。 否则,当目标脚本被分类为潜在的恶意脚本时,目标脚本可能必须由反恶意软件进一步评估或发送到后端系统。

    METHOD AND ARRANGEMENT FOR AUTOMATIC CHARSET DETECTION
    2.
    发明申请
    METHOD AND ARRANGEMENT FOR AUTOMATIC CHARSET DETECTION 失效
    自动检测的方法和布置

    公开(公告)号:US20110213736A1

    公开(公告)日:2011-09-01

    申请号:US12714378

    申请日:2010-02-26

    IPC分类号: G06F15/18

    CPC分类号: G06N99/005

    摘要: The invention relates, in an embodiment, to a method for handling a received document. The method includes receiving a plurality of text document samples. The method includes training, using a plurality of text document samples, to obtain a set of machine learning models. Training includes generating fundamental units from the plurality of text document samples for charsets of the plurality of text document samples. Training includes extracting a subset of said fundamental units as feature lists and converting the feature lists into a set of feature vectors. Training further includes generating the set of machine learning models from the set of feature vectors. The method includes applying the set of machine learning models against a set of target document feature vectors converted from the received document. The method includes decoding the received document to obtain decoded content of the received document based on at least the first encoding scheme.

    摘要翻译: 本发明在一个实施例中涉及一种用于处理接收的文件的方法。 该方法包括接收多个文本文档样本。 该方法包括使用多个文本文档样本来获得一组机器学习模型的训练。 培训包括从多个文本文档样本中生成用于多个文本文档样本的字符集的基本单元。 训练包括将所述基本单元的子集提取为特征列表并将特征列表转换成一组特征向量。 训练进一步包括从一组特征向量生成一套机器学习模型。 该方法包括针对从接收到的文档转换的一组目标文档特征向量应用机器学习模型集合。 该方法包括至少基于第一编码方案来解码接收到的文档以获得接收到的文档的解码内容。

    METHODS FOR MATCHING IMAGE-BASED TEXUAL INFORMATION WITH REGULAR EXPRESSIONS
    3.
    发明申请
    METHODS FOR MATCHING IMAGE-BASED TEXUAL INFORMATION WITH REGULAR EXPRESSIONS 失效
    用于匹配具有常规表达形式的基于图像的文本信息的方法

    公开(公告)号:US20100074534A1

    公开(公告)日:2010-03-25

    申请号:US12235543

    申请日:2008-09-22

    IPC分类号: G06K9/62

    摘要: A method for matching an image-form textual string in an image to a regular expression is disclosed. The method includes constructing a representation of the regular expression and generating a candidate string of characters from the image-form textual string. The method further includes ascertaining whether there exists a match between the image-form textual string and the regular expression, the match is deemed achieved if a probability value associated with the match is above a predetermined matching threshold.

    摘要翻译: 公开了将图像中的图像形式的文本字符串与正则表达式进行匹配的方法。 该方法包括构建正则表达式的表示,并从图像形式的文本字符串生成候选字符串。 该方法还包括确定在图像形式文本串和正则表达式之间是否存在匹配,如果与匹配相关联的概率值高于预定匹配阈值,则认为匹配。

    Method and arrangement for automatic charset detection
    4.
    发明授权
    Method and arrangement for automatic charset detection 失效
    自动字符集检测的方法和布置

    公开(公告)号:US08560466B2

    公开(公告)日:2013-10-15

    申请号:US12714378

    申请日:2010-02-26

    IPC分类号: G06F15/18

    CPC分类号: G06N99/005

    摘要: The invention relates, in an embodiment, to a method for handling a received document. The method includes receiving a plurality of text document samples. The method includes training, using a plurality of text document samples, to obtain a set of machine learning models. Training includes generating fundamental units from the plurality of text document samples for charsets of the plurality of text document samples. Training includes extracting a subset of said fundamental units as feature lists and converting the feature lists into a set of feature vectors. Training further includes generating the set of machine learning models from the set of feature vectors. The method includes applying the set of machine learning models against a set of target document feature vectors converted from the received document. The method includes decoding the received document to obtain decoded content of the received document based on at least the first encoding scheme.

    摘要翻译: 本发明在一个实施例中涉及一种用于处理接收的文件的方法。 该方法包括接收多个文本文档样本。 该方法包括使用多个文本文档样本来获得一组机器学习模型的训练。 培训包括从多个文本文档样本中生成用于多个文本文档样本的字符集的基本单元。 训练包括将所述基本单元的子集提取为特征列表并将特征列表转换成一组特征向量。 训练进一步包括从一组特征向量生成一套机器学习模型。 该方法包括针对从接收到的文档转换的一组目标文档特征向量应用机器学习模型集合。 该方法包括至少基于第一编码方案来解码接收到的文档以获得接收到的文档的解码内容。

    Zero day malware scanner
    5.
    发明授权
    Zero day malware scanner 有权
    零天恶意软件扫描仪

    公开(公告)号:US08375450B1

    公开(公告)日:2013-02-12

    申请号:US12573300

    申请日:2009-10-05

    IPC分类号: G06F21/00

    摘要: A training model for malware detection is developed using common substrings extracted from known malware samples. The probability of each substring occurring within a malware family is determined and a decision tree is constructed using the substrings. An enterprise server receives indications from client machines that a particular file is suspected of being malware. The suspect file is retrieved and the decision tree is walked using the suspect file. A leaf node is reached that identifies a particular common substring, a byte offset within the suspect file at which it is likely that the common substring begins, and a probability distribution that the common substring appears in a number of malware families. A hash value of the common substring is compared (exact or approximate) against the corresponding substring in the suspect file. If positive, a result is returned to the enterprise server indicating the probability that the suspect file is a member of a particular malware family.

    摘要翻译: 使用从已知恶意软件样本中提取的常见子串开发恶意软件检测的培训模型。 确定在恶意软件系列内发生每个子串的概率,并使用该子串构建一个决策树。 企业服务器从客户机接收到特定文件被怀疑是恶意软件的指示。 检索可疑文件,并使用可疑文件行进决策树。 到达一个叶节点,标识一个特定的共同子串,可疑文件中可能是公共子串开始的字节偏移量,以及常见子字符串出现在多个恶意软件系列中的概率分布。 将公共子串的哈希值与可疑文件中的相应子字符串进行比较(精确或近似)。 如果为肯定,则返回给企业服务器的结果,指示可疑文件是特定恶意软件系列成员的概率。

    Automatic charset detection using support vector machines with charset grouping
    6.
    发明授权
    Automatic charset detection using support vector machines with charset grouping 失效
    使用带有字符集分组的支持向量机的自动字符集检测

    公开(公告)号:US07689531B1

    公开(公告)日:2010-03-30

    申请号:US11238478

    申请日:2005-09-28

    IPC分类号: G06F15/18 G06F15/00

    CPC分类号: G06N99/005

    摘要: The invention relates, in an embodiment, to a computer-implemented method for automatic charset detection, which includes detecting an encoding scheme of a target document. The method includes training, using a plurality of text document samples, to obtain a set of machine learning models. Training includes using a SVM (Support Vector Machine) technique to generate the set of machine learning models from feature vectors obtained from the plurality of text document samples. The method also includes applying the set of machine learning models against a set of target document feature vectors converted from the target document to detect the encoding scheme.

    摘要翻译: 本发明在一个实施例中涉及用于自动字符集检测的计算机实现的方法,其包括检测目标文档的编码方案。 该方法包括使用多个文本文档样本来获得一组机器学习模型的训练。 培训包括使用支持向量机(Support Vector Machine,支持向量机)技术,从多个文本文档样本获得的特征向量生成机器学习模型集合。 该方法还包括将一组机器学习模型应用于从目标文档转换的一组目标文档特征向量以检测编码方案。

    Identifying sensitive expressions in images for languages with large alphabets
    7.
    发明授权
    Identifying sensitive expressions in images for languages with large alphabets 有权
    在具有大字母的语言的图像中识别敏感表达式

    公开(公告)号:US08699796B1

    公开(公告)日:2014-04-15

    申请号:US12268714

    申请日:2008-11-11

    IPC分类号: G06K9/00 G06F17/24

    摘要: One embodiment relates to a method of identifying sensitive expressions in images for a language with a large alphabet. The method is performed using a computer and includes (i) extracting an image from a message, (ii) extracting image character-blocks (i.e. normalized pixel graphs) from the image, and (iii) predicting characters to which the character-blocks correspond using a multi-class learning model, wherein the multi-class learning model is trained using a derived list of sensitive characters which is a subset of the large alphabet. In addition, (iv) the characters may be combined into string text, and (v) the string text may be searched for matches with a predefined list of sensitive expressions. Another embodiment relates to a method of training a multi-class learning model so that the model predicts characters to which image character-blocks correspond. Other embodiments, aspects and features are also disclosed herein.

    摘要翻译: 一个实施例涉及一种用于识别具有大字母表的语言的图像中的敏感表达的方法。 该方法使用计算机执行,并且包括(i)从消息中提取图像,(ii)从图像中提取图像字符块(即归一化的像素图),以及(iii)预测字符块对应的字符 使用多类学习模型,其中使用作为大字母表的子集的敏感字符的导出列表来训练多类学习模型。 另外,(iv)可以将字符组合成字符串文本,并且(v)字符串文本可以被搜索与预定义的敏感表达式列表的匹配。 另一个实施例涉及一种训练多类学习模型的方法,使得模型预测图像字符块对应的字符。 本文还公开了其它实施例,方面和特征。

    Methods for matching image-based texual information with regular expressions
    8.
    发明授权
    Methods for matching image-based texual information with regular expressions 失效
    将基于图像的情色信息与正则表达式匹配的方法

    公开(公告)号:US08260054B2

    公开(公告)日:2012-09-04

    申请号:US12235543

    申请日:2008-09-22

    IPC分类号: G06K9/62

    摘要: A method for matching an image-form textual string in an image to a regular expression is disclosed. The method includes constructing a representation of the regular expression and generating a candidate string of characters from the image-form textual string. The method further includes ascertaining whether there exists a match between the image-form textual string and the regular expression, the match is deemed achieved if a probability value associated with the match is above a predetermined matching threshold.

    摘要翻译: 公开了将图像中的图像形式的文本字符串与正则表达式进行匹配的方法。 该方法包括构建正则表达式的表示,并从图像形式的文本字符串生成候选字符串。 该方法还包括确定在图像形式文本串和正则表达式之间是否存在匹配,如果与匹配相关联的概率值高于预定匹配阈值,则认为匹配。

    Lightweight SVM-based content filtering system for mobile phones
    9.
    发明授权
    Lightweight SVM-based content filtering system for mobile phones 有权
    用于手机的基于SVM的轻量级内容过滤系统

    公开(公告)号:US08023974B1

    公开(公告)日:2011-09-20

    申请号:US11706539

    申请日:2007-02-15

    IPC分类号: H04W4/00

    摘要: In one embodiment, a content filtering system generates a support vector machine (SVM) learning model in a server computer and provides the SVM learning model to a mobile phone for use in classifying text messages. The SVM learning model may be generated in the server computer by training a support vector machine with sample text messages that include spam and legitimate text messages. A resulting intermediate SVM learning model from the support vector machine may include a threshold value, support vectors and alpha values. The SVM learning model in the mobile phone may include the threshold value, the features, and the weights of the features. An incoming text message may be parsed for the features. The weights of features found in the incoming text message may be added and compared to the threshold value to determine whether or not the incoming text message is spam.

    摘要翻译: 在一个实施例中,内容过滤系统在服务器计算机中生成支持向量机(SVM)学习模型,并将SVM学习模型提供给移动电话以用于分类文本消息。 SVM学习模型可以在服务器计算机中通过训练具有包括垃圾邮件和合法文本消息的示例文本消息的支持向量机来生成。 来自支持向量机的得到的中间SVM学习模型可以包括阈值,支持向量和α值。 移动电话中的SVM学习模型可以包括阈值,特征和特征的权重。 可能会为特征解析输入的文本消息。 可以添加传入文本消息中发现的功能的权重并将其与阈值进行比较,以确定传入的文本消息是否为垃圾邮件。

    Lightweight content filtering system for mobile phones
    10.
    发明授权
    Lightweight content filtering system for mobile phones 有权
    轻便的手机内容过滤系统

    公开(公告)号:US07756535B1

    公开(公告)日:2010-07-13

    申请号:US11483073

    申请日:2006-07-07

    IPC分类号: H04W4/00 G06F15/16 H04L12/58

    CPC分类号: H04L51/12 H04L51/38 H04W4/12

    摘要: In one embodiment, a content filtering system includes a feature list and a learning model. The feature list may be a subset of a dictionary that was used to train the content filtering system to identify classification (e.g., spam, phishing, porn, legitimate text messages, etc.) of text messages during a training stage. The learning model may include representative vectors, each of which represents a particular class of text messages. The learning model and the feature list may be generated in a server computer during the training stage and then subsequently provided to the mobile phone. An incoming text message in the mobile phone may be parsed for occurrences of feature words included in the feature list and then converted to an input vector. The input vector may be compared to the learning model to determine the classification of the incoming text message.

    摘要翻译: 在一个实施例中,内容过滤系统包括特征列表和学习模型。 特征列表可以是用于训练内容过滤系统以在训练阶段期间识别文本消息的分类(例如,垃圾邮件,网络钓鱼,色情,合法文本消息等)的字典的子集。 学习模型可以包括代表性的向量,每个代表一个特定类别的文本消息。 学习模型和特征列表可以在训练阶段在服务器计算机中产生,然后随后提供给移动电话。 移动电话中的传入文本消息可以被解析为特征列表中包括的特征词的出现,然后转换成输入向量。 输入向量可以与学习模型进行比较,以确定输入文本消息的分类。