Distributed random forest training with a predictor trained to balance tasks

    公开(公告)号:US11625640B2

    公开(公告)日:2023-04-11

    申请号:US16152578

    申请日:2018-10-05

    Abstract: In one embodiment, a device distributes sets of training records from a training dataset for a random forest-based classifier among a plurality of workers of a computing cluster. Each worker determines whether it can perform a node split operation locally on the random forest by comparing a number of training records at the worker to a predefined threshold. The device determines, for each of the split operations, a data size and entropy measure of the training records to be used for the split operation. The device applies a machine learning-based predictor to the determined data size and entropy measure of the training records to be used for the split operation, to predict its completion time. The device coordinates the workers of the computing cluster to perform the node split operations in parallel such that the node split operations in a given batch are grouped based on their predicted completion times.

    Multi-Modal Models for Detecting Malicious Emails

    公开(公告)号:US20240333733A1

    公开(公告)日:2024-10-03

    申请号:US18127501

    申请日:2023-03-28

    CPC classification number: H04L63/1425 G06V10/82 H04L63/1416 H04L63/1441

    Abstract: In some aspects, the techniques described herein relate to a method for detecting malicious emails, the method including: receiving an email, wherein the email is associated with a markup payload; determining, based on the markup payload, text data associated with the email; determining, using the text data and a first machine learning model, a first representation of the email representing text associated with the email; rendering the email to generate image data that represents a rendering of the email; determining, using the image data and a second machine learning model, a second representation of the email that represents at least the rendering of the email; and determining a prediction for the email based on the first representation and the second representation, wherein the prediction represents whether the email is predicted to be malicious based on the first representation and the second representation.

    STATISTICAL MODELING OF EMAIL SENDERS TO DETECT BUSINESS EMAIL COMPROMISE

    公开(公告)号:US20240356969A1

    公开(公告)日:2024-10-24

    申请号:US18220065

    申请日:2023-07-10

    CPC classification number: H04L63/1483 G06Q10/107

    Abstract: Techniques for an email-security system to screen emails, extract information from the emails, analyze the extracted information, assign probability scores to the emails, and classify the email as suspicious or not. A method is disclosed that includes analyzing an email and extracting a first sender attribute and a second sender attribute from the email. Identifying one or more sender-specific models associated with a sending device, and applying one or more sender-specific models to determine a first probability value associated with the first sender attribute that conveys a likelihood that the first sender attribute is a misused sender attribute. Applying one or more sender-specific models to determine a second probability value associated with the second sender attribute is a second misused sender attribute, and determining, by using the first probability value and the second probability value, an overall probability value associated with a likelihood that the email is suspicious or not.

Patent Agency Ranking