Method, apparatus, device and medium for protecting sensitive data

    公开(公告)号:US12019771B2

    公开(公告)日:2024-06-25

    申请号:US18539851

    申请日:2023-12-14

    Applicant: Lemon Inc.

    CPC classification number: G06F21/62 G06N3/04 G06N3/098

    Abstract: There are proposed a method, device, apparatus, and medium for protecting sensitive data. In a method, to-be-processed data is received from a server device. A processing result of a user for the to-be-processed data is received, the processing result comprising sensitive data of the user for the processing of the to-be-processed data. A gradient for training a server model at the server device is determined based on a comparison between the processing result and a prediction result for the to-be-processed data. The gradient is updated in a change direction associated with the gradient so as to generate an updated gradient to be sent to the server device. Noise is added only in the change direction associated with the gradient. The corresponding overhead of processing noise in a plurality of directions can be reduced, and no excessive noise data interfering with training will be introduced to the updated gradient.

    MACHINE LEARNING MODEL EVALUATION

    公开(公告)号:US20250005459A1

    公开(公告)日:2025-01-02

    申请号:US18885135

    申请日:2024-09-13

    Applicant: Lemon Inc.

    Abstract: Embodiments of the disclosure provide a solution for machine learning model evaluation. The solution includes: obtaining a target answer to a test question generated by a target machine learning (ML) model; obtaining a plurality of reference answers to the test question generated respectively by a plurality of reference ML models; determining respective professional levels of the plurality of reference ML models in answering the test question; and generating an evaluation result on correctness of the target ML model in question answering based on the target answer, the plurality of reference answers and the respective professional levels of the plurality of reference ML models.

    MODEL TRAINING OF ACTOR MODEL AND CRITIC MODEL

    公开(公告)号:US20250061339A1

    公开(公告)日:2025-02-20

    申请号:US18936628

    申请日:2024-11-04

    Applicant: Lemon Inc.

    Abstract: Embodiments of the present disclosure provide a solution for model training. A method comprises: performing training of a critic model and training of an actor model according to an alternating scheme. The actor model is configured to generate a response for an input question based on a feedback generated by the critic model, and the critic model is configured to generate a feedback to a response generated by the actor mode.

    MACHINE LEARNING MODEL ALIGNMENT
    4.
    发明申请

    公开(公告)号:US20250021891A1

    公开(公告)日:2025-01-16

    申请号:US18900432

    申请日:2024-09-27

    Abstract: A method is proposed for machine learning (ML) model alignment. In the method, a first number of samples is generated by a target ML model based on samples selected from a set of samples. A sample comprises a question-answer pair. The set of samples is updated by adding at least a portion of the first number of samples to the set of samples. The target ML model is trained with at least a portion of the updated set of samples. In this way, the ML model self-generalization ability is unlocked to perform alignment with near-zero human supervision.

    WATERMARK PROCESSING
    5.
    发明申请

    公开(公告)号:US20250086257A1

    公开(公告)日:2025-03-13

    申请号:US18957333

    申请日:2024-11-22

    Abstract: Embodiments of the present disclosure provide a solution for watermark processing. A method includes: dividing at least one portion of a n original text for watermark embedding into a plurality of original text segments; determining, for an original text segment of the plurality of original text segments, a target symbol from a symbol sequence in watermark information; converting, based on respective target symbols determined for the plurality of original text segments, the plurality of original text segments into a plurality of watermarked text segments by using a set of language models for watermark embedding, the set of language models corresponding to a set of symbol values respectively; and generating a watermarked text for the original text based on the plurality of watermarked text segments.

    METHOD, APPARATUS, ELECTRONIC DEVICE AND MEDIUM FOR DETERMINING FAIRNESS IMPACT OF MODEL

    公开(公告)号:US20250013887A1

    公开(公告)日:2025-01-09

    申请号:US18746832

    申请日:2024-06-18

    Applicant: Lemon Inc.

    Abstract: Embodiments of the present disclosure relate to a method, apparatus, electronic device, and medium for determining fairness impact of a sample on a model. The method comprises generating a counterfactual sample by adjusting an original sample in an original sample set, the original sample set being used for generating an original model for performing a classification task. The method further comprises determining a fairness metric of the original model on a validation sample set. In addition, the method further comprises determining fairness impact of the original sample on the original model based on the fairness metric, the original sample, and the counterfactual sample.

    UNLEARNING OF RECOMMENDATION MODELS
    7.
    发明公开

    公开(公告)号:US20240070525A1

    公开(公告)日:2024-02-29

    申请号:US17897697

    申请日:2022-08-29

    Applicant: Lemon Inc.

    CPC classification number: G06N20/00 G06F21/6245

    Abstract: The present disclosure describes techniques of performing machine unlearning in a recommendation model. An unlearning process of the recommendation model may be initiated in response to receiving a request for deleting a fraction of user data from any particular user. The recommendation model may be pre-trained to recommend content to users based at least in part on user data. Values of entries in a matrix corresponding to the fraction of user data may be configured as zero. The matrix may comprise entries denoting preferences of users with respect to content items. Confidence values associated with the fraction of user data may be configured as zero to block influence of the fraction of user data on performance of the recommendation model. The unlearning process may be implemented by performing a number of iterations until the recommendation model has converged.

    DATA PROCESSING FOR RELEASE WHILE PROTECTING INDIVIDUAL PRIVACY

    公开(公告)号:US20230161899A1

    公开(公告)日:2023-05-25

    申请号:US17535398

    申请日:2021-11-24

    Applicant: Lemon Inc.

    CPC classification number: G06F21/6245

    Abstract: The present disclosure describes techniques of releasing data while protecting individual privacy. A dataset may be compressed by applying a first random matrix. The dataset may be owned by a party among a plurality of parties and there may be a plurality of datasets owned by the plurality of parties. A noise may be added by applying a random Gaussian matrix to the compressed dataset to obtain a processed dataset. The processed dataset ensures data privacy protection. The processed dataset may be released to other parties.

    LABEL INFERENCE IN SPLIT LEARNING DEFENSES
    9.
    发明公开

    公开(公告)号:US20230143789A1

    公开(公告)日:2023-05-11

    申请号:US18149462

    申请日:2023-01-03

    Applicant: Lemon Inc.

    CPC classification number: G06N3/084 G06N3/045

    Abstract: Split learning is provided to train a composite neural network (CNN) model that is split into first and second submodels, including receiving a noise-laden backpropagation gradient, training the surrogate submodel by optimizing a gradient distance loss, and computing an updated dummy label using the first submodel and the trained surrogate submodel to infer label information of the second submodel. Noise can be added to a label of the second submodel or a shared backpropagation gradient to protect the label information.

Patent Agency Ranking