Dataset: Predict Dropout or Academic Success
Background: This is an assignment given to build a feature classifier to predict whether a student will drop out or graduate.
Goal: I hope to become more familiar with the sklearn library and to build a successful eight-feature classifier.
Methods: In order to achieve my goal, I used Spyder software and sklearn library to train and test data based on different features in order to build a strong classifier. The features I chose to test were marital status, gender, debt, scholarship, educational special needs, displacement, course, and nationality. The feature classifier initially ran based on one feature, and the strongest feature (based of F1-Score) was chosen, then the feature classifier was ran using two features, and the process continued until it reached eight features.
Results and Analysis:


The tables above shows the accuracy, precision, recall, and F1 score for each of the feature combinations. The conditional formatting of the F1 column provides a visual reference for the F1 scores of each classifier model. The most successful one-feature classifier was the course category, then course and scholarship, and so on, until eight-features were completed and the final classifier was course, scholarship, debt, nationality, gender, marital status, educational special needs, and displacement. This model has an accuracy of 60.6%, a precision of 55.2%, a recall of 60.6% and an F1 score of 54.7%

Course
The one-feature classifier shows a 55.8% accuracy and a 0.484 F1 score, making it the best of the tested one-feature classifiers

Course, Scholarship
The two-feature classifier shows a 56.8% accuracy and a 0.516 F1 score, showing improvement from the one-feature classifier and making it the best of the two-feature classifiers.

Course, Scholarship, Debt
The three-feature classifier shows a 59.6% accuracy and a 0.540 F1 score, making it the best of the three-feature classifiers.

Course, Scholarship, Debt, Nationality
The four-feature classifier shows a 59.7% accuracy and a 0.542 F1 score, making it the best of the four-feature classifiers.

Course, Scholarship, Debt, Nationality, Gender
The five-feature classifier shows a 60.7% accuracy and a 0.552 F1 score, making it the best of the five-feature classifiers.

Course, Scholarship, Debt, Nationality, Gender, Marital Status
The six-feature classifier shows a 59.5% accuracy and a 0.537 F1 score, making it the best of the six-feature classifiers.

Course, Scholarship, Debt, Nationality, Gender, Marital Status, Educational Special Needs
The seven-feature classifier shows a 58.8% accuracy and a 0.531 F1 score, making it the best of the seven-feature classifiers.

Course, Scholarship, Debt, Nationality, Gender, Marital Status, Educational Special Needs, Displacement
The eight-feature classifier shows a 60.6% accuracy and a 0.547 F1 score, making it the best of the eight-feature classifiers. Overall it has the second highest accuracy and the second highest F1 score. The five-feature classifier scores better in both regards, making it the best classifier overall.
Future Directions:
I would be interested to explore more features and study their impact on the accuracy of the classifier.
Leave a Reply