-
公开(公告)号:US20240354222A1
公开(公告)日:2024-10-24
申请号:US18138330
申请日:2023-04-24
Applicant: MICROSOFT TECHNOLOGY LICENSING, LLC.
Inventor: NAN DUAN , SHENGYU FU , SHUAI LU , NEELAKANTAN SUNDARESAN , ALEXEY SVYATKOVSKIY
CPC classification number: G06F11/3636 , G06N20/00
Abstract: A large language model, previously pre-trained on multiple source code modeling tasks, is pre-trained, through curriculum learning, to learn to predict a code execution trace given a source code program. The model is pre-trained using a variety of pre-training datasets consisting of pairs of a source code sample and a corresponding execution trace. The curriculum pre-training starts with a pre-training dataset of single line executions and adds in additional pre-training datasets with more increasing complex behaviors. The pre-training datasets include mutation-augmented source code samples and their corresponding execution traces.
-
公开(公告)号:US20240160435A1
公开(公告)日:2024-05-16
申请号:US17985849
申请日:2022-11-12
Applicant: MICROSOFT TECHNOLOGY LICENSING, LLC.
Inventor: NAN DUAN , SHENGYU FU , SHUAI LU , NEELAKANTAN SUNDARESAN , ALEXEY SVYATKOVSKIY
Abstract: A deep learning model is pre-trained with a large-scale of unsupervised data of code review tasks in order to learn the relationships between code changes and a code review. The pre-trained deep learning model predicts a code review given a code diff hunk in a code diff format. The code diff hunk includes the changed code and its surrounding context. The pre-trained deep learning model may then be fine-tuned with supervised data in order to make predictions for several code review activities, such as, code change quality estimation and code refinement.
-
公开(公告)号:US20230359441A1
公开(公告)日:2023-11-09
申请号:US17740042
申请日:2022-05-09
Applicant: MICROSOFT TECHNOLOGY LICENSING, LLC.
Inventor: NAN DUAN , SHUAI LU , NEELAKANTAN SUNDARESAN , ALEXEY SVYATKOVSKIY
CPC classification number: G06F8/33 , G06F40/30 , G06N3/0454
Abstract: A retrieval-augmented code completion system uses the context of a partially-formed source code snippet of a source code program and a hint to predict the source code tokens needed to complete the partially-formed source code snippet. The hint is a source code segment that completes a semantically-similar source code segment of the partially-formed source code snippet. The hint is found in a retrieval source code database using a hybrid retrieval technique. A deep learning decoder model uses the context of the partially-formed source code snippet and the hint to predict the most likely candidate sequence of source code tokens to complete the partially-formed source code snippet.
-
-