A Comparative Study of Thyroid Dysfunction Prediction Models using Machine Learning
Keywords:
Decision Tree, Gradient Boosting, Logistic Regression, Machine Learning, Thyroid Dysfunction, Confusion Matrix, Early DetectionAbstract
The prevalence of Thyroid Dysfunction (TD) is now alarming worldwide, particularly in Africa due to environmental and increased poor nutritional factors. The treatment of TD is valid only when it is detected and diagnosed accurately at early stages. The diagnosis of Thyroid Dysfunction requires experience and sound knowledge to analyze test results, however, the current manual method of interpreting test results in most developing countries is subjective and error-prone, however, the best scenario is to predict and detect the disease as early as possible. Data Mining techniques have been explored in the literature to automatically predict diseases based on patients' data in hospitals and clinics however the features used were less than adequate in modern methods context. Hence this work will explore the use of Machine Learning to evaluate twelve (12) elements or features of patient blood test data. Machine Learning (ML), a known subset of the field of Artificial Intelligence (AI) employs different statistical, probabilistic, and optimization operating rules that let the computer "learn" from earlier cases and then detect challenging to recognize patterns of event from massive, noisy or compound datasets. The drive for the current research is towards developing an efficient model to predict thyroid dysfunction at the early stages. These models are less expensive to build; thereby making sure that qualitative healthcare is affordable and accessible to the marginalized population in most developing and third world countries. In this research, the datasets used were acquired from the UCI (University of California, Irvine) Public Machine Repository Database which contains three thousand, seven hundred and seventy-four (3774) patients' records. These records include levels of Free Triiodothyronine (FT3), Stimulating Thyroid Hormone (TSH), Triiodothyronine (T3) and Thyroxine (T4) amongst others. The Machine Learning Models used are Gradient Boosting, Decision Tree and Logistic Regression, and their accuracy, precision, and recall values were compared. The best accuracy (0.981), precision (0.827) and recall (0.727) were obtained in the Logistic Regression model. Therefore, integrating the Logistic Regression based model into a real-time Hospital Management System can enable medical experts to use the T3, T4, FTI, TSH levels gotten from blood test results to predict whether the patient has thyroid dysfunction or not.