A final project for CSE 590 Networks and Data Mining Techniques at Stony Brook University.
Includes a project writeup with references.
Abstract
We compare a variety of collaborative filtering recommender systems on a public set of voting data for Reddit.com. Our goal is to apply these systems to the novel task of recommending item categories rather than items themselves. This goal is applicable to Reddit for the purpose of making “subreddits” beyond the default set more discoverable. We find descriptive statistics of the data to inform the recommender systems’ parameters, discovering that Reddit data frequently follows a power-law distribution. We then evaluate memory-based and model-based recommender systems on both rating prediction and top-N recommendation tasks, and conclude by suggesting how the better-performing systems can be used by Reddit.