Abstract:
A computing device receives a document that was incorrectly classified as sensitive data based on a machine learning-based detection (MLD) profile. The computing device modifies a training data set that was used to generate the MLD profile by adding the document to the training data set as a negative example of sensitive data to generate a modified training data set. The computing device then analyzes the modified training data set using machine learning to generate an updated MLD profile.
Abstract:
A computer-implemented method may include (1) identifying a plurality of specific categories of sensitive information to be protected by a DLP system, (2) obtaining a training data set for each specific category of sensitive information that includes a plurality of positive and a plurality of negative examples of the specific category of sensitive information, (3) using machine learning to train, based on an analysis of the training data sets, at least one machine learning-based classifier that is capable of detecting items of data that contain one or more of the plurality of specific categories of sensitive information, and then (4) deploying the machine learning-based classifier within the DLP system to enable the DLP system to detect and protect items of data that contain one or more of the plurality of specific categories of sensitive information in accordance with at least one DLP policy of the DLP system.