Purpose: Molecularly targeted drugs (MTD) have been widely used for patients who have non-small cell lung cancer (NSCLC). The epidermal growth factor receptor (EGFR) tyrosine kinase inhibitor is one of MTD and its therapy has a longer progression free survival (PFS) than conventional chemotherapy. The purpose of this study is to predict EGFR mutation status using machine learning and CT images of NSCLC.
Methods: This study was consisted of 172 patients with NSCLC who had undergone biopsy or surgical specimen. All NSCLCs were pathologically confirmed adenocarcinoma and EGFR mutation status (whether EGFR mutations or wild-type). Then, EGFR mutations were confirmed common or uncommon mutations. Furthermore, common mutations were confirmed subtypes whether 19del or L858R or others. Patients data were divided into training and test data. Total 1048 radiomic features were extracted, then radiomic features and 4 clinical features were used to construct the three prediction models (EGFR mutations vs. wild-type, common vs. uncommon mutations and 19del vs. L858R). Lasso was used to select features and two machine learning algorithms (support vector machine (SVM) and logistic regression (LR)) with five-fold cross-validation were applied. The prediction performance of each model was evaluated in terms of the AUCs in training and test data.
Results: For the training data of EGFR mutations vs. wild-type, AUC of SVM and LR were 0.85 and 0.84. For the test data, AUCs were 0.78 and 0.77. For the training data of common vs. uncommon mutations, AUC of SVM and LR were 0.84 and 0.84. For the test data, AUCs were 0.82 and 0.79. For the training data of 19del vs. L858R, AUC of SVM and LR were 0.82 and 0.84. For the test data, AUCs were 0.71 and 0.67.
Conclusion: The constructed models were relatively high performance. These may be useful in selecting appropriate MTD.