-
公开(公告)号:US11762990B2
公开(公告)日:2023-09-19
申请号:US16917626
申请日:2020-06-30
IPC分类号: G06F21/55 , G06F16/955 , G06N5/04 , G06N20/00
CPC分类号: G06F21/554 , G06F16/9566 , G06N5/04 , G06N20/00 , G06F2221/034
摘要: The technology described herein identifies malicious URLs using a classifier that is both accurate and fast. Aspects of the technology are particularly well adapted for use as a real-time URL security analysis tool because the technology is able to quickly process a URL and produce a warning when a malicious URL is identified. The rapid processing speed of the technology described herein is produced, in part, by use of only a single input signal, which is the URL itself. The high accuracy produced by the technology described herein is achieved by analyzing the unstructured text on both a character-by-character level and a word-by-word level. The technology described herein uses both character-level and word-level information from the incoming URL.
-
公开(公告)号:US12003535B2
公开(公告)日:2024-06-04
申请号:US17246352
申请日:2021-04-30
发明人: Jack Wilson Stokes, III , Pranav Ravindra Maneriker , Arunkumar Gururajan , Diana Anca Carutasu , Edir Vinicio Garcia Lazo
CPC分类号: H04L63/1483 , G06F40/284 , G06N3/045 , G06N3/08
摘要: The technology described herein can identify phishing URLs using transformers. The technology tokenizes useful features from the subject URL. The useful features can include the text of the URL and other data associated with the URL, such as certificate data for the subject URL, a referrer URL, an IP address, etc. The technology may build a joint Byte Pair Encoding for the features. The token encoding may be processed through a transformer, resulting in a transformer output. The transformer output, which may be described as a token embedding, may be input to a classifier to determine whether the URL is a phishing URL. Additional or improved URL training data may be generated by permuting token order, by simulating a homoglyph attack, and by simulating an a compound word attack.
-
公开(公告)号:US20170220545A1
公开(公告)日:2017-08-03
申请号:US15349821
申请日:2016-11-11
发明人: Arunkumar Gururajan , Mihai Aldea , Theodor J. Scott , Kamal Choudhary , Eugene Chudin , Si-Qing Chen , Daniel R. Snyder , Michelle Keslin , Jeff D. Jarrard , Sanjeev Bagaria , John Hoegger , Cynthia Guo , Tony Y. Tzeng , Jin Hee Lim
IPC分类号: G06F17/24 , G06F3/0482 , G06F17/21
CPC分类号: G06F17/248 , G06F3/0482 , G06F17/212 , G06F17/2211
摘要: Automatic generation of document templates based on recognized composition element patterns in a group of clustered documents is provided. Composition elements used in documents are typically unique to a particular user or to a group of users. An automated template generation system detects composition element patterns in documents associated with a given user. Sequences of composition elements from one document are aligned with sequences of composition elements of one or more other documents. The aligned sequences are scored to generate a document distance matrix. The documents are clustered together based on the alignment scores and a document template is generated for each corresponding cluster of documents. In one or more aspects, selecting a document template and updating it results in a modified document template or, in certain cases, a new document template. The generated document templates are displayed in a user interface for selection by a user.
-
-