-
公开(公告)号:US12086287B2
公开(公告)日:2024-09-10
申请号:US17980371
申请日:2022-11-03
Applicant: SNOWFLAKE INC.
Inventor: David Jensen , Joseph David Jensen
CPC classification number: G06F21/6254 , G06F16/221 , G06F16/282 , G06F21/6227
Abstract: A method receives data from a data source. The method generates a plurality of generalizations of the data. The method sends the plurality of generalizations of the data to a plurality of execution nodes, wherein each of the plurality of execution nodes includes computational resources to compute a candidate generalization using an information loss scoring function. The method receives a candidate generalization from each of the plurality of execution nodes. The method selects a preferred generalization from the plurality of candidate generalizations. The method generates an anonymized view of the data set using the preferred generalization.
-
公开(公告)号:US11630853B2
公开(公告)日:2023-04-18
申请号:US17163156
申请日:2021-01-29
Applicant: Snowflake Inc.
Inventor: Craig E. Hawco , Joseph David Jensen
IPC: G06F16/28
Abstract: Generating semantic names for a data set is described. An example method can include retrieving data from a data set, the data organized in a plurality of columns. The method may also include generating one or more candidate semantic categories for that column, wherein each of the one or more candidate semantic categories has a corresponding probability for each of the columns. The method may also further include creating a feature vector for each column from the one or more column candidate semantic categories and the corresponding probabilities. Additionally, the method may also include selecting, for each column, a column semantic category from the one or more candidate semantic categories using at least the feature vector and a trained machine learning model.
-
公开(公告)号:US11755778B2
公开(公告)日:2023-09-12
申请号:US17352218
申请日:2021-06-18
Applicant: SNOWFLAKE INC.
Inventor: David Jensen , Joseph David Jensen
CPC classification number: G06F21/6254 , G06F16/221 , G06F16/282 , G06F21/6227
Abstract: Generating an anonymized view for a data set is described. An example method can include receiving data from a data set, wherein the data is organized in a plurality of columns. The method may also include generating a plurality of generalizations of the data. The method may also further include selecting a generalization from the plurality of generalizations using an information loss scoring function based on at least a generalization information loss. Additionally, the method may also include generating an anonymized view of the data set from the selected generalization.
-
公开(公告)号:US11501021B1
公开(公告)日:2022-11-15
申请号:US17352217
申请日:2021-06-18
Applicant: SNOWFLAKE INC.
Inventor: David Jensen , Joseph David Jensen
Abstract: Generating an anonymized view for a data set is described. An example method can include receiving data from a data set, wherein the data is organized in a plurality of columns. The method may also include generating a plurality of generalizations of the data. The method may also further include selecting a generalization from the plurality of generalizations using an information loss scoring function based on at least a generalization information loss. Additionally, the method may also include generating an anonymized view of the data set from the selected generalization.
-
公开(公告)号:US11853329B2
公开(公告)日:2023-12-26
申请号:US18124415
申请日:2023-03-21
Applicant: SNOWFLAKE INC.
Inventor: Craig E. Hawco , Joseph David Jensen
CPC classification number: G06F16/285 , G06F16/221 , G06N5/01
Abstract: Systems and method are disclosed that retrieve data from a data set organized in a plurality of columns. For each column in the plurality of columns, the systems and method generate one or more candidate semantic categories for the column, where each of the one or more candidate semantic categories has a corresponding probability. The systems and method create a feature vector for the column from the one or more candidate semantic categories and the corresponding probabilities. The systems and method determine a semantic category type of the column based on the feature vector. The systems and method anonymize the data in the column based on the semantic category type, which includes replacing more specific data in the column with less specific data based on a data hierarchy that relates the more specific data to the less specific data.
-
公开(公告)号:US20230050290A1
公开(公告)日:2023-02-16
申请号:US17980371
申请日:2022-11-03
Applicant: SNOWFLAKE INC.
Inventor: David Jensen , Joseph David Jensen
Abstract: A method receives data from a data source. The method generates a plurality of generalizations of the data. The method sends the plurality of generalizations of the data to a plurality of execution nodes, wherein each of the plurality of execution nodes includes computational resources to compute a candidate generalization using an information loss scoring function. The method receives a candidate generalization from each of the plurality of execution nodes. The method selects a preferred generalization from the plurality of candidate generalizations. The method generates an anonymized view of the data set using the preferred generalization.
-
公开(公告)号:US20220343019A1
公开(公告)日:2022-10-27
申请号:US17352217
申请日:2021-06-18
Applicant: SNOWFLAKE INC.
Inventor: David Jensen , Joseph David Jensen
Abstract: Generating an anonymized view for a data set is described. An example method can include receiving data from a data set, wherein the data is organized in a plurality of columns. The method may also include generating a plurality of generalizations of the data. The method may also further include selecting a generalization from the plurality of generalizations using an information loss scoring function based on at least a generalization information loss. Additionally, the method may also include generating an anonymized view of the data set from the selected generalization.
-
公开(公告)号:US20220343012A1
公开(公告)日:2022-10-27
申请号:US17352218
申请日:2021-06-18
Applicant: SNOWFLAKE INC.
Inventor: David Jensen , Joseph David Jensen
Abstract: Generating an anonymized view for a data set is described. An example method can include receiving data from a data set, wherein the data is organized in a plurality of columns. The method may also include generating a plurality of generalizations of the data. The method may also further include selecting a generalization from the plurality of generalizations using an information loss scoring function based on at least a generalization information loss. Additionally, the method may also include generating an anonymized view of the data set from the selected generalization.
-
-
-
-
-
-
-