Abstract:
Disclosed herein are systems, methods, and computer-readable storage media for identifying and remediating risky source files. An example system configured to practice the method can gather data describing each file in a source code repository, and generate, using a weighted algorithm based on empirical relationships between the data and customer-found defects, a risk score for each file in the source code repository, wherein the weighted algorithm prioritizes factors based on predictiveness of defects. Then the system can generate a list of files having risk scores above a threshold, and make risk-mitigation recommendations based on the risk scores. A file can include a single file or a collection of files such as a module. The system can identify, for each file in the list of files having risk scores above the threshold, a respective risk type, and make the risk-mitigation recommendation for each file based on the respective risk type.