Methods and systems for automated detection of personal information using neural networks

    公开(公告)号:US11663406B2

    公开(公告)日:2023-05-30

    申请号:US16945525

    申请日:2020-07-31

    Applicant: NetApp, Inc.

    Inventor: Adam Bali

    CPC classification number: G06F40/284 G06F40/205 G06F40/253

    Abstract: A method, a computing device, and a non-transitory machine-readable medium for detecting personal information. Terms that are of interest are extracted from a corpus of raw text that has been extracted from a collection of documents. For each of the terms, a surrounding sentence is extracted to form a target sentence to thereby form a plurality of target sentences. The surrounding sentence includes at least one reference to a data subject. A matrix of feature information is generated for each of the target sentences to form a plurality of matrices. A neural network model is trained, using the matrices as input, to compute an output that indicates a likelihood of a given sentence containing personal information.

    DOCUMENT CLASSIFICATION USING ATTENTION NETWORKS

    公开(公告)号:US20200210526A1

    公开(公告)日:2020-07-02

    申请号:US16237853

    申请日:2019-01-02

    Applicant: NETAPP, INC.

    Abstract: A system comprising at least one hardware processor; and a non-transitory computer-readable storage medium having stored thereon program instructions, the program instructions executable by the at least one hardware processor to: receive, as input, a plurality of electronic documents, apply a trained machine learning classifier to automatically classify at least some of said plurality of electronic documents, wherein said machine learning classifier comprises two or more attention layers, and wherein at least one of the attention layers comprises an adjustable parameter which controls a distribution of attention weights assigned by said attention layer.

    METHODS AND SYSTEMS FOR AUTOMATED DETECTION OF PERSONAL INFORMATION USING NEURAL NETWORKS

    公开(公告)号:US20220036003A1

    公开(公告)日:2022-02-03

    申请号:US16945525

    申请日:2020-07-31

    Applicant: NetApp, Inc.

    Inventor: Adam Bali

    Abstract: A method, a computing device, and a non-transitory machine-readable medium for detecting personal information. Terms that are of interest are extracted from a corpus of raw text that has been extracted from a collection of documents. For each of the terms, a surrounding sentence is extracted to form a target sentence to thereby form a plurality of target sentences. The surrounding sentence includes at least one reference to a data subject. A matrix of feature information is generated for each of the target sentences to form a plurality of matrices. A neural network model is trained, using the matrices as input, to compute an output that indicates a likelihood of a given sentence containing personal information.

    METHODS AND SYSTEMS FOR AUTOMATED DOCUMENT CLASSIFICATION WITH PARTIALLY LABELED DATA USING SEMI-SUPERVISED LEARNING

    公开(公告)号:US20220036134A1

    公开(公告)日:2022-02-03

    申请号:US16945420

    申请日:2020-07-31

    Applicant: NetApp, Inc.

    Abstract: A method, a computing device, and a non-transitory machine-readable medium for classifying documents. A document collection is sorted into a plurality of categories. A classifier corresponding to a category of the plurality of categories is trained to output a probability that a document associated with the category is of a selected type (e.g., confidential). The training includes determining, by the processor, that a cardinality of a set of negative samples in a train set is not above a pipeline threshold but is at least one and training the classifier via a first pipeline and a second pipeline using a training group that includes a first portion of a group of positive samples in the train set, a second portion of a set of negative samples in the train set, and a third portion of a group of unlabeled samples in the train set

    Document classification using attention networks

    公开(公告)号:US10824815B2

    公开(公告)日:2020-11-03

    申请号:US16237853

    申请日:2019-01-02

    Applicant: NETAPP, INC.

    Abstract: A system comprising at least one hardware processor; and a non-transitory computer-readable storage medium having stored thereon program instructions, the program instructions executable by the at least one hardware processor to: receive, as input, a plurality of electronic documents, apply a trained machine learning classifier to automatically classify at least some of said plurality of electronic documents, wherein said machine learning classifier comprises two or more attention layers, and wherein at least one of the attention layers comprises an adjustable parameter which controls a distribution of attention weights assigned by said attention layer.

Patent Agency Ranking