Distributed Vectorized Representations of Source Code Commits

    公开(公告)号:US20220129261A1

    公开(公告)日:2022-04-28

    申请号:US17080520

    申请日:2020-10-26

    Applicant: SAP SE

    Abstract: Distributed vector representations of source code commits, are generated to become part of a data corpus for machine learning (ML) for analyzing source code. The code commit is received, and time information is referenced to split the source code into pre-change source code and post-change source code. The pre-change source code is converted into a first code representation (e.g., based on a graph model), and the post-change source code into a second code representation. A first particle is generated from the first code representation, and a second particle is generated from the second code representation. The first particle and the second particle are compared to create a delta. The delta is transformed into a first commit vector by referencing an embedding matrix to numerically encode the first particle and the second particle. Following classification, the commit vector is stored in a data corpus for performing ML analysis upon source code.

Patent Agency Ranking