-
1.
公开(公告)号:US12210648B2
公开(公告)日:2025-01-28
申请号:US17830237
申请日:2022-06-01
Applicant: Microsoft Technology Licensing, LLC
Inventor: Sekhar Poornananda Chintalapati , Vinod Kumar Yelahanka Srinivas , Dattatraya Baban Rajpure , Pieter Kristian Brouwer , Gaurav Anil Yeole , Mihai Silviu Peicu
IPC: G06F21/62 , G06F16/174 , G06F40/284 , G06F40/295
Abstract: Methods and systems for detecting personally identifiable information in data associated with a cloud computing system are described. An example method includes ingesting the data associated with the cloud computing system to generate source data. The method includes processing the source data by: performing cell-based de-duplication to generate cell-based de-duplicated data, subjecting the cell-based de-duplicated data to regular expression classification to generate a first subset of initial results, tokenizing the cell-based de-duplicated data to generate tokenized data, and de-duplicating the tokenized data and subjecting de-duplicated tokenized data to a first named entity recognition classification to generate a second subset of the initial results. The method includes cross-referencing the cell-based de-duplicated data and the initial results and subjecting output of the cross-referencing to a second named entity recognition classification to generate final results. The method includes processing the final results to detect any personally identifiable information in the final results.