This repository contains solution to coding challenge recommendation-system.
- This recommendation system uses data from IMDB/MovieLens dataset.
- Some concepts used here are - PageRank, Content-Based Recommendation, Collaborative Filtering.
- Machine learning terminology you'll come across here - One-Hot Encoding, Cross-Validation, R-squared metric.
- The notebook uses and compares Linear Regression and Decision Trees models to predict movie ratings for users.
- ML pipeline used here - data wrangling->exploratory data analysis->feature engineering->baseline model->best model.
Python libraries: re, ast, time, heapq, decimal, operator, subprocess, numpy, scipy, pandas, seaborn, networkx, rpy2, itertools, matplotlib, datetime, collections, networkx, sklearn, surprise
R libraries: doMC, Kmisc, igraph, data.table
- Jupyter notebook recommendationSystem.ipynb (Python kernel) is the master file.
- It makes use of:
- wd_um_graph.txt generated by weightedDirectedUserGraph.ipynb (R kernel)
- wu_movie_graph.txt generated by weightedUndirectedMovieGraph.ipynb (R kernel)
- The repository directory structure given below must be maintained for the code to run successfully.
The directory structure for my repo is as follows:
├── README.md
├── Data
│ └── u.data
│ └── u.genre
│ └── u.info
│ └── u.item
│ └── u.occupation
│ └── u.user
├── Files
| └── *
├── Scripts
└── recommendationSystem.ipynb
└── weightedDirectedUserGraph.ipynb
└── weightedUndirectedMovieGraph.ipynb