Diagnosis Classification

Problem Definition

In the context of medical diagnosis, a classification problem using machine learning could involve predicting the presence or absence of a particular disease or condition based on a set of sympotoms.

The focus of this project is to find the best possible approach to predict the diagnosis based on the patient's symptoms as accurately as possible. To achieve this, there are also some obstacles, as most medical data lacks features and also includes "noisy data", In addition, the classification algorithms must be carefully selected so that the dataset provides the necessary features to achieve the best accuracy. The parameters for the model must be selected appropriately to generate a genrealized model that can accurately classify any given input.

Goals

The goal of this classification problem would be to develop a machine learning model that can accurately predict the diagnosis of a patient based on a set of input features, such as symptoms, medical history, and test results. The model could be trained using a dataset of labeled diagnosis, where each patient is represented by a set of symptoms and a corresponding diagnosis label.

Methodology

Classification models and classification algorithms are used in order to predict a diagnosis based on some input data. Based on the symptoms offered by the patient, algorithms can classify these symptoms into a diagnosis. The model is first trained with the diagnoses and their afferent symptoms and can later determine the patient's diagnosis with high accuracy.

Six different algorithms were chosen to analyze and solve the problem: Naive Bayes Classifier, Random Forest Classifier, Decision Tree, Support Vector Machines, K-Nearest Neighbor and Backpropagation.

The selected algorithms are evaluated on a dataset containing 132 symptoms and 41 diagnosis classes. Moreover, parameter optimization for the algorithms was used to improve the models. This optimization was based on several tests and methods used for analyzing the model’s performance. All the algorithms have high accuracy but the ones that had were proven by tests to have the best accuracy are Multinomial Naive Bayes, the Multi-layer Perceptron classifier and the Random Forest algorithm.

Results

Algorithms accuracy before optimization
Algorithms accuracy before optimization
Algorithms accuracy after optimization
Algorithms accuracy after optimization