-
公开(公告)号:US20220036134A1
公开(公告)日:2022-02-03
申请号:US16945420
申请日:2020-07-31
Applicant: NetApp, Inc.
Inventor: Adam Bali , Yuval Alaluf
Abstract: A method, a computing device, and a non-transitory machine-readable medium for classifying documents. A document collection is sorted into a plurality of categories. A classifier corresponding to a category of the plurality of categories is trained to output a probability that a document associated with the category is of a selected type (e.g., confidential). The training includes determining, by the processor, that a cardinality of a set of negative samples in a train set is not above a pipeline threshold but is at least one and training the classifier via a first pipeline and a second pipeline using a training group that includes a first portion of a group of positive samples in the train set, a second portion of a set of negative samples in the train set, and a third portion of a group of unlabeled samples in the train set