Metadata classification
Abstract:
Systems and method are disclosed that retrieve data from a data set organized in a plurality of columns. For each column in the plurality of columns, the systems and method generate one or more candidate semantic categories for the column, where each of the one or more candidate semantic categories has a corresponding probability. The systems and method create a feature vector for the column from the one or more candidate semantic categories and the corresponding probabilities. The systems and method determine a semantic category type of the column based on the feature vector. The systems and method anonymize the data in the column based on the semantic category type, which includes replacing more specific data in the column with less specific data based on a data hierarchy that relates the more specific data to the less specific data.
Public/Granted literature
Information query
Patent Agency Ranking
0/0