HackRush'22 | Team PRAY Submission

Problem Statement

In an alternate universe, due to the unbalanced workload among the faculty of Stanford University, the university is suffering from high faculty attrition and has become an object of mockery among the public. There has also been a petition to rename it to Standard University due to this mismanagement.

To combat the inevitable backlash, the university aims to build a system that can tell the number of students that will enrol in a course in a given academic year. Such a system will not only allow the university's stakeholders to smartly recruit faculty to balance faculty workload but also gauge the student's interest in a given course to decide if the given course should be offered or not.

Now the question is how to build these systems? That's where you come in!

In this challenge, you will develop a model that tries to forecast the future total student enrolment for courses offered at the university based on the historic enrolment trend of the last 200 years.

Our approach

Feature Engineering

Timestep
- dtypes is object. we have convert it to numerical
- Made a new coloum name "Year" contains first academic year
- ex. "AY1810-AY1811" (dtype object) converted into "1810" (dtype int)
Course and Faculty
- dtypes is object. we have convert it to numerical
- Used One-Hot-Encoding

Feature Selection

Droped the following columns: 'Id', 'Timestep', 'Course', 'Faculty'

Model Building

Used CatBoostRegressor with following hyperparameter

learning_rate = 0.75
depth = 8
n_estimators = 2000

Added the bias of 25

Models we used?

Linear Regression(scikit)
Random forest regression(scikit)
CatBoostRegressor(catboost)
Sequential model(tensorflow)

Tuning Hyperparameters

n_estimators: number of trees in the forest
depth: depth of the tree
learning_rate: determines the step size at each iteration while moving toward a minimum of a loss function

Challenges we faced!

Faculty and Courses are given as labels but machine learning required numerical data for processing
Normalizing of inputs/outputs
Splitting data for training and testing
Finding perfect parameters for our model
High training times
Overfitting
Presence of outliers and missing entries

References

Contributors

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
data		data
.gitignore		.gitignore
README.md		README.md
make_submission.ipynb		make_submission.ipynb
model_cat_4.joblib		model_cat_4.joblib
submission_17_(test).csv		submission_17_(test).csv
submission_best.csv		submission_best.csv
test.csv		test.csv
train.csv		train.csv
train_model.ipynb		train_model.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HackRush'22 | Team PRAY Submission

Problem Statement

Our approach

Feature Engineering

Feature Selection

Model Building

Models we used?

Tuning Hyperparameters

Challenges we faced!

References

Contributors

About

Releases

Packages

Contributors 2

Languages

yash-meshram/Hackrush-22-ML-Challenge

Folders and files

Latest commit

History

Repository files navigation

HackRush'22 | Team PRAY Submission

Problem Statement

Our approach

Feature Engineering

Feature Selection

Model Building

Models we used?

Tuning Hyperparameters

Challenges we faced!

References

Contributors

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages