HackRush'22 | Team PRAY Submission
In an alternate universe, due to the unbalanced workload among the faculty of Stanford University, the university is suffering from high faculty attrition and has become an object of mockery among the public. There has also been a petition to rename it to Standard University due to this mismanagement.
To combat the inevitable backlash, the university aims to build a system that can tell the number of students that will enrol in a course in a given academic year. Such a system will not only allow the university's stakeholders to smartly recruit faculty to balance faculty workload but also gauge the student's interest in a given course to decide if the given course should be offered or not.
Now the question is how to build these systems? That's where you come in!
In this challenge, you will develop a model that tries to forecast the future total student enrolment for courses offered at the university based on the historic enrolment trend of the last 200 years.
- Timestep
- dtypes is object. we have convert it to numerical
- Made a new coloum name "Year" contains first academic year
- ex. "AY1810-AY1811" (dtype object) converted into "1810" (dtype int)
- Course and Faculty
- dtypes is object. we have convert it to numerical
- Used One-Hot-Encoding
Droped the following columns: 'Id', 'Timestep', 'Course', 'Faculty'
Used CatBoostRegressor with following hyperparameter
- learning_rate = 0.75
- depth = 8
- n_estimators = 2000
Added the bias of 25
- Linear Regression(scikit)
- Random forest regression(scikit)
- CatBoostRegressor(catboost)
- Sequential model(tensorflow)
n_estimators
: number of trees in the forestdepth
: depth of the treelearning_rate
: determines the step size at each iteration while moving toward a minimum of a loss function
- Faculty and Courses are given as labels but machine learning required numerical data for processing
- Normalizing of inputs/outputs
- Splitting data for training and testing
- Finding perfect parameters for our model
- High training times
- Overfitting
- Presence of outliers and missing entries
- sklearn.ensemble.RandomForestRegressor
- CatBoostRegressor
- tf.keras.Sequential
- Stackoverflow
- Towards Data Science
Yash Meshram |
Anupam Kumar |
Pradeep Saini |
Robin Kumar |