SAMPLE-DIFFERENCE-BASED METHOD AND SYSTEM FOR INTERPRETING DEEP-LEARNING MODEL FOR CODE CLASSIFICATION

    公开(公告)号:US20240192929A1

    公开(公告)日:2024-06-13

    申请号:US18475447

    申请日:2023-09-27

    CPC classification number: G06F8/35 G06F8/42

    Abstract: A sample-difference-based method and system for interpreting a deep-learning model for code classification is provided, wherein the method includes a step of off-line training an interpreter: constructing code transformation for every code sample in a training set to generate difference samples; generating difference samples respectively through feature deletion and code snippets extraction and then calculating feature importance scores accordingly; and inputting the original code samples, the difference samples and the feature importance scores into a neural network to get a trained interpreter; and a step of on-line interpreting the code samples: using the trained interpreter to extract important features from the snippets, then using an influence-function-based method to identify training samples that are most contributive to prediction, comparing the obtained important features and the most contributive training samples, and generating interpretation results for the object samples. The inventive system includes an off-line training module and an on-line interpretation module.

    METHOD, SYSTEM AND PROCESSOR FOR ENHANCING ROBUSTNESS OF SOURCE-CODE CLASSIFICATION MODEL

    公开(公告)号:US20250013463A1

    公开(公告)日:2025-01-09

    申请号:US18650290

    申请日:2024-04-30

    Abstract: A method, system and processor for enhancing robustness of a source-code classification model based on invariant features is provided, wherein the method includes: combining non-robustness features to generate different style templates, converting codes in an input code training set into new codes of different styles to obtain a converted-code training set, merging the input-code and the converted-code training set into an expanded training set, and converting code texts in the expanded training set into code images; and converting the code images into required vectors, pairing samples of identical class randomly picked from the expanded training set and inputting the matched sample pairs into a feature extractor, iteratively updating the feature extractor and the matched sample pairs and extracting target characteristics, and training the extracted invariant features in a classifier to produce a trained model. The disclosed system includes a training set-expanding module and a model-training module.

Patent Agency Ranking