Accurately identifying members of training data in variational autoencoders by reconstruction error

    公开(公告)号:US11501172B2

    公开(公告)日:2022-11-15

    申请号:US16219645

    申请日:2018-12-13

    Applicant: SAP SE

    Abstract: A system is described that can include a machine learning model and at least one programmable processor communicatively coupled to the machine learning model. The machine learning model can receive data, generate a continuous probability distribution associated with the data, sample a latent variable from the continuous probability distribution to generate a plurality of samples, and generate reconstructed data from the plurality of samples. The at least one programmable processor can compute a reconstruction error by determining a distance between the reconstructed data and the data, and generate, based on the reconstruction error, an indication representing whether a specific record within the received data was used to train the machine learning model. Related apparatuses, methods, techniques, non-transitory computer programmable products, non-transitory machine-readable medium, articles, and other systems are also within the scope of this disclosure.

    Computer systems for detecting training data usage in generative models

    公开(公告)号:US11366982B2

    公开(公告)日:2022-06-21

    申请号:US16140022

    申请日:2018-09-24

    Applicant: SAP SE

    Abstract: Various examples are directed to systems and methods for detecting training data for a generative model. A computer system may access generative model sample data and a first test sample. The computer system may determine whether a first generative model sample of the plurality of generative model samples is within a threshold distance of the first test sample and whether a second generative model sample of the plurality of generative model samples is within the threshold distance of the first test sample. The computer system may determine that a probability that the generative model was trained with the first test sample is greater than or equal to a threshold probability based at least in part on whether the first generative model sample is within the threshold distance of the first test sample, the determining also based at least in part on whether the second generative model sample is within the threshold distance of the first test sample.

    COMPUTER SYSTEMS FOR DETECTING TRAINING DATA USAGE IN GENERATIVE MODELS

    公开(公告)号:US20200097763A1

    公开(公告)日:2020-03-26

    申请号:US16140022

    申请日:2018-09-24

    Applicant: SAP SE

    Abstract: Various examples are directed to systems and methods for detecting training data for a generative model. A computer system may access generative model sample data and a first test sample. The computer system may determine whether a first generative model sample of the plurality of generative model samples is within a threshold distance of the first test sample and whether a second generative model sample of the plurality of generative model samples is within the threshold distance of the first test sample. The computer system may determine that a probability that the generative model was trained with the first test sample is greater than or equal to a threshold probability based at least in part on whether the first generative model sample is within the threshold distance of the first test sample, the determining also based at least in part on whether the second generative model sample is within the threshold distance of the first test sample.

Patent Agency Ranking