-
1.
公开(公告)号:US20240161019A1
公开(公告)日:2024-05-16
申请号:US18507705
申请日:2023-11-13
Inventor: Heuiseok LIM , Gyeongmin KIM
IPC: G06N20/20 , G06F16/215 , G06F16/901
CPC classification number: G06N20/20 , G06F16/215 , G06F16/9014
Abstract: Disclosed herein is a method of generating a similarity determination model of programming codes based on a cross-validation ensemble and filtering strategy. The method of generating the similarity determination model is performed by a computing device including at least a processor, the method includes: performing preprocessing on raw data written in any one language; performing filtering on the preprocessed data; generating positive pairs and negative pairs for training; and training a pre-trained language model using the generated positive pairs and negative pairs.