Welcome to my personal attempt at exploring an unseen dataset and conducting various analyses, culminating in a predictive modeling task. Throughout this project, I have learned a great deal, and I have saved my work both as a showcase and as a reference for my future self. To make the analysis more accessible and user-friendly, I have also provided a PDF document summarizing my findings.
During the course of this analysis, I have identified several areas where improvements can be made. I have listed these areas below and plan to address them in the future:
- Building a program to update the dataset on the fly using the Socrata API
- Incorporating additional external data sources such as temperature and holidays
- Handling the large size of the dataset (~1.6GB)
- Exploring the possibility of displaying all geographical points on a map
- Working with Choropleths and GeoJSON for geospatial analysis
- Continuing to tackle the challenges of feature engineering
- Performing time series analysis and forecasting
- Considering non-linear models instead of relying solely on linear models
- Utilizing GPU acceleration for boosting algorithms
- Employing hyperparameter tuning with Optuna
- Running boosting algorithms in PySpark for scalability
- Determining the appropriate evaluation metric, considering Type I and Type II errors
- Creating a dynamic table to display results directly within the notebook
Feel free to explore the notebook yourself using the following options:
I hope you find my analysis engaging and insightful. If you have any suggestions or feedback, please don't hesitate to share. Let's continue our journey of exploring and understanding Chicago crime data together!