Hybrid deep learning model for software defect prediction fusion

The content discusses the dataset description, data preprocessing, handling imbalance in data, parameters setting, baseline methods, performance measures, and evaluation criteria for software defect prediction models. The dataset used in the study is from the PROMISE repository, containing CSV data and Java source code files. The study focuses on extracting AST nodes from the Java files, generating semantic features using Word2vec embedding, and handling traditional features. Imbalanced data is addressed through undersampling. Parameters for the proposed CNN-MLP model are set, and baseline methods like TR, CNN, DBN, LSTM, SDP-BB, DP-HNN, and ACGDP are compared. Performance measures include recall, precision, F1 scores, and AUC for non-effort-aware evaluation, while PofB20 metric is used for effort-aware evaluation. The study aims to improve software defect prediction by combining semantic and traditional features and evaluating the model’s performance under different scenarios. The proposed approach is compared against established methods to highlight its unique strengths and advantages. The study emphasizes the importance of handling imbalanced data, setting appropriate parameters, and using suitable evaluation metrics to assess model performance accurately.

Source link

Source link: https://www.nature.com/articles/s41598-024-65639-4