SYNTHETIC TRAINING DATASETS FOR PERSONALLY IDENTIFIABLE INFORMATION CLASSIFIERS

    公开(公告)号:US20230244811A1

    公开(公告)日:2023-08-03

    申请号:US17589610

    申请日:2022-01-31

    Applicant: Box, Inc.

    CPC classification number: G06F21/6245 G06F21/6272 G06F16/93

    Abstract: Handling user-demanded privacy controls over data of an electronic document collaboration system. A storage facility is configured to store content objects and associated metadata that pertains to the content objects. A user raises a privacy action request that comprises a demand to change how certain content objects that contain personally identifiable information (PII) of the user are handled. A plurality of content objects are classified using a PII classifier that is trained using synthetically-generated training set entries where, rather than reading actual contents from electronic documents of the collaboration system to generate training set entries, instead, the training set entries are generated using words that are randomly selected from a repository of natural language words. When PII corresponding to the user who raised the privacy action request is discovered in content objects, then the content management system modifies those content objects and/or its metadata in accordance with the demand.

Patent Agency Ranking