-
公开(公告)号:US12019771B2
公开(公告)日:2024-06-25
申请号:US18539851
申请日:2023-12-14
Applicant: Lemon Inc.
Inventor: Xin Yang , Junyuan Xie , Jiankai Sun , Yuanshun Yao , Chong Wang
Abstract: There are proposed a method, device, apparatus, and medium for protecting sensitive data. In a method, to-be-processed data is received from a server device. A processing result of a user for the to-be-processed data is received, the processing result comprising sensitive data of the user for the processing of the to-be-processed data. A gradient for training a server model at the server device is determined based on a comparison between the processing result and a prediction result for the to-be-processed data. The gradient is updated in a change direction associated with the gradient so as to generate an updated gradient to be sent to the server device. Noise is added only in the change direction associated with the gradient. The corresponding overhead of processing noise in a plurality of directions can be reduced, and no excessive noise data interfering with training will be introduced to the updated gradient.
-
公开(公告)号:US20250005459A1
公开(公告)日:2025-01-02
申请号:US18885135
申请日:2024-09-13
Applicant: Lemon Inc.
Inventor: Yuanshun Yao , Jiaheng Wei , Jean-Francois Ton , Hongyi Guo , Andrew Estornell , Yang Liu
IPC: G06N20/00
Abstract: Embodiments of the disclosure provide a solution for machine learning model evaluation. The solution includes: obtaining a target answer to a test question generated by a target machine learning (ML) model; obtaining a plurality of reference answers to the test question generated respectively by a plurality of reference ML models; determining respective professional levels of the plurality of reference ML models in answering the test question; and generating an evaluation result on correctness of the target ML model in question answering based on the target answer, the plurality of reference answers and the respective professional levels of the plurality of reference ML models.
-
公开(公告)号:US20250061339A1
公开(公告)日:2025-02-20
申请号:US18936628
申请日:2024-11-04
Applicant: Lemon Inc.
Inventor: Andrew Estornell , Jean-Francois Ton , Yuanshun Yao , Yang Liu
Abstract: Embodiments of the present disclosure provide a solution for model training. A method comprises: performing training of a critic model and training of an actor model according to an alternating scheme. The actor model is configured to generate a response for an input question based on a feedback generated by the critic model, and the critic model is configured to generate a feedback to a response generated by the actor mode.
-
公开(公告)号:US20250021891A1
公开(公告)日:2025-01-16
申请号:US18900432
申请日:2024-09-27
Applicant: Beijing Youzhuju Network Technology Co., Ltd. , Lemon Inc.
Inventor: Yuanshun Yao , Hongyi Guo , Xiaoying Zhang , Yang Liu
IPC: G06N20/00
Abstract: A method is proposed for machine learning (ML) model alignment. In the method, a first number of samples is generated by a target ML model based on samples selected from a set of samples. A sample comprises a question-answer pair. The set of samples is updated by adding at least a portion of the first number of samples to the set of samples. The target ML model is trained with at least a portion of the updated set of samples. In this way, the ML model self-generalization ability is unlocked to perform alignment with near-zero human supervision.
-
公开(公告)号:US20250086257A1
公开(公告)日:2025-03-13
申请号:US18957333
申请日:2024-11-22
Applicant: Beijing Youzhuju Network Technology Co., Ltd. , Lemon Inc.
Inventor: Xiaojun Xu , Jinghan Jia , Hang Li , Yuanshun Yao
IPC: G06F21/16 , G06F40/289
Abstract: Embodiments of the present disclosure provide a solution for watermark processing. A method includes: dividing at least one portion of a n original text for watermark embedding into a plurality of original text segments; determining, for an original text segment of the plurality of original text segments, a target symbol from a symbol sequence in watermark information; converting, based on respective target symbols determined for the plurality of original text segments, the plurality of original text segments into a plurality of watermarked text segments by using a set of language models for watermark embedding, the set of language models corresponding to a set of symbol values respectively; and generating a watermarked text for the original text based on the plurality of watermarked text segments.
-
公开(公告)号:US20250013887A1
公开(公告)日:2025-01-09
申请号:US18746832
申请日:2024-06-18
Applicant: Lemon Inc.
Inventor: Yuanshun Yao , Yang Liu
IPC: G06N5/022
Abstract: Embodiments of the present disclosure relate to a method, apparatus, electronic device, and medium for determining fairness impact of a sample on a model. The method comprises generating a counterfactual sample by adjusting an original sample in an original sample set, the original sample set being used for generating an original model for performing a classification task. The method further comprises determining a fairness metric of the original model on a validation sample set. In addition, the method further comprises determining fairness impact of the original sample on the original model based on the fairness metric, the original sample, and the counterfactual sample.
-
公开(公告)号:US20240070525A1
公开(公告)日:2024-02-29
申请号:US17897697
申请日:2022-08-29
Applicant: Lemon Inc.
Inventor: Jiankai Sun , Xinlei Xu , Xin Yang , Yuanshun Yao , Chong Wang
CPC classification number: G06N20/00 , G06F21/6245
Abstract: The present disclosure describes techniques of performing machine unlearning in a recommendation model. An unlearning process of the recommendation model may be initiated in response to receiving a request for deleting a fraction of user data from any particular user. The recommendation model may be pre-trained to recommend content to users based at least in part on user data. Values of entries in a matrix corresponding to the fraction of user data may be configured as zero. The matrix may comprise entries denoting preferences of users with respect to content items. Confidence values associated with the fraction of user data may be configured as zero to block influence of the fraction of user data on performance of the recommendation model. The unlearning process may be implemented by performing a number of iterations until the recommendation model has converged.
-
公开(公告)号:US20230161899A1
公开(公告)日:2023-05-25
申请号:US17535398
申请日:2021-11-24
Applicant: Lemon Inc.
Inventor: Xin Yang , Yuanshun Yao , Tianyi Liu , Jiankai Sun , Chong Wang , Ruihan Wu
IPC: G06F21/62
CPC classification number: G06F21/6245
Abstract: The present disclosure describes techniques of releasing data while protecting individual privacy. A dataset may be compressed by applying a first random matrix. The dataset may be owned by a party among a plurality of parties and there may be a plurality of datasets owned by the plurality of parties. A noise may be added by applying a random Gaussian matrix to the compressed dataset to obtain a processed dataset. The processed dataset ensures data privacy protection. The processed dataset may be released to other parties.
-
公开(公告)号:US20230143789A1
公开(公告)日:2023-05-11
申请号:US18149462
申请日:2023-01-03
Applicant: Lemon Inc.
Inventor: Shangyu Xie , Jiankai Sun , Xin Yang , Yuanshun Yao , Tianyi Liu , Taiqing Wang
Abstract: Split learning is provided to train a composite neural network (CNN) model that is split into first and second submodels, including receiving a noise-laden backpropagation gradient, training the surrogate submodel by optimizing a gradient distance loss, and computing an updated dummy label using the first submodel and the trained surrogate submodel to infer label information of the second submodel. Noise can be added to a label of the second submodel or a shared backpropagation gradient to protect the label information.
-
-
-
-
-
-
-
-