Built a recommendation system for Recommending Similar Universities.
Leveraging NEO4J, we aim to revolutionize higher education by offering tailored university recommendations, unlocking every student's potential for a transformative educational future. Traditional university recommendation systems, often static and one-size-fits-all, fail to meet individual student needs and aspirations. NEO4J's graph database reimagines higher education by enabling highly personalized, context-aware university recommendations, transforming the educational landscape.
- Handling Complex Relationships
- Dynamic and Real-Time
- Personalized Recommendations
- Visualizing Recommendations
- Data Accuracy and Completeness
- Graph Database Architecture
- Flexible Query Language (Cypher)
- Performance Advantage (efficient traversal & graph algorithms)
- Intuitive Data Model
- Compatibility with Python
The dataset contains information about educational institutions in the USA, with various columns providing details about rankings, enrollment, location, and other relevant metrics.
- act-avg: Average ACT scores for admitted students.
- sat-avg: Average SAT scores for admitted students.
- enrollment: Total student enrollment.
- city: Location city of the institution.
- zip: ZIP code of the institution's location.
- acceptance-rate: Percentage of applicants accepted.
- percent-receiving-aid: Percentage of students receiving financial aid.
- cost-after-aid: Cost for students after receiving financial aid.
- state: State where the institution is located.
- hs-gpa-avg: Average high school GPA of admitted students.
- rankingDisplayRank: Displayed rank in rankings.
- businessRepScore: Reputation score for the business department.
- tuition: Tuition fees for students.
- engineeringRepScore: Reputation score for the engineering department.
- displayName: Name used for display purposes.
- institutionalControl: Control of the institution (e.g., public, private).
Source: Kaggle
Number of Rows: 311
Number of Columns: 39
- Dropping the unnecessary Columns
- Handling the Missing Values
The image above shows a network of universities. The nodes represent the universities having properties such as acceptance rate, average ACT score, average SAT score, average GPA, city, state, etc. The edges between the nodes named as 'Similarity_edge' represent relationships between the universities. The similarity score between the universities is calculated based on various criteria such as ACT scores, SAT scores, GPA, Acceptance Rate, and reputation scores like Business Reputation Score and Engineering Reputation Score by calculating the Euclidean Distance between these features. The image below shows the Euclidean distance between the John Hopkins and Northwestern universities.
K-Nearest Neighbors (KNN) is a machine learning algorithm used for classification and regression tasks. This algorithm is employed to recommend the top five similar universities based on the similarity score.
- Ease of Setup and Flexibility
- Integration with Neo4j
- User Interface and Visualization
- Scalability and Deployment