-
1.
公开(公告)号:US11663406B2
公开(公告)日:2023-05-30
申请号:US16945525
申请日:2020-07-31
Applicant: NetApp, Inc.
Inventor: Adam Bali
IPC: G06F40/284 , G06F40/205 , G06F40/253
CPC classification number: G06F40/284 , G06F40/205 , G06F40/253
Abstract: A method, a computing device, and a non-transitory machine-readable medium for detecting personal information. Terms that are of interest are extracted from a corpus of raw text that has been extracted from a collection of documents. For each of the terms, a surrounding sentence is extracted to form a target sentence to thereby form a plurality of target sentences. The surrounding sentence includes at least one reference to a data subject. A matrix of feature information is generated for each of the target sentences to form a plurality of matrices. A neural network model is trained, using the matrices as input, to compute an output that indicates a likelihood of a given sentence containing personal information.
-
公开(公告)号:US20200210526A1
公开(公告)日:2020-07-02
申请号:US16237853
申请日:2019-01-02
Applicant: NETAPP, INC.
Inventor: Guy Leibovitz , Adam Bali
Abstract: A system comprising at least one hardware processor; and a non-transitory computer-readable storage medium having stored thereon program instructions, the program instructions executable by the at least one hardware processor to: receive, as input, a plurality of electronic documents, apply a trained machine learning classifier to automatically classify at least some of said plurality of electronic documents, wherein said machine learning classifier comprises two or more attention layers, and wherein at least one of the attention layers comprises an adjustable parameter which controls a distribution of attention weights assigned by said attention layer.
-
3.
公开(公告)号:US20220036003A1
公开(公告)日:2022-02-03
申请号:US16945525
申请日:2020-07-31
Applicant: NetApp, Inc.
Inventor: Adam Bali
IPC: G06F40/284 , G06F40/253 , G06F40/205 , G06N3/08
Abstract: A method, a computing device, and a non-transitory machine-readable medium for detecting personal information. Terms that are of interest are extracted from a corpus of raw text that has been extracted from a collection of documents. For each of the terms, a surrounding sentence is extracted to form a target sentence to thereby form a plurality of target sentences. The surrounding sentence includes at least one reference to a data subject. A matrix of feature information is generated for each of the target sentences to form a plurality of matrices. A neural network model is trained, using the matrices as input, to compute an output that indicates a likelihood of a given sentence containing personal information.
-
公开(公告)号:US20220036134A1
公开(公告)日:2022-02-03
申请号:US16945420
申请日:2020-07-31
Applicant: NetApp, Inc.
Inventor: Adam Bali , Yuval Alaluf
Abstract: A method, a computing device, and a non-transitory machine-readable medium for classifying documents. A document collection is sorted into a plurality of categories. A classifier corresponding to a category of the plurality of categories is trained to output a probability that a document associated with the category is of a selected type (e.g., confidential). The training includes determining, by the processor, that a cardinality of a set of negative samples in a train set is not above a pipeline threshold but is at least one and training the classifier via a first pipeline and a second pipeline using a training group that includes a first portion of a group of positive samples in the train set, a second portion of a set of negative samples in the train set, and a third portion of a group of unlabeled samples in the train set
-
公开(公告)号:US10824815B2
公开(公告)日:2020-11-03
申请号:US16237853
申请日:2019-01-02
Applicant: NETAPP, INC.
Inventor: Guy Leibovitz , Adam Bali
Abstract: A system comprising at least one hardware processor; and a non-transitory computer-readable storage medium having stored thereon program instructions, the program instructions executable by the at least one hardware processor to: receive, as input, a plurality of electronic documents, apply a trained machine learning classifier to automatically classify at least some of said plurality of electronic documents, wherein said machine learning classifier comprises two or more attention layers, and wherein at least one of the attention layers comprises an adjustable parameter which controls a distribution of attention weights assigned by said attention layer.
-
-
-
-