Created a social network analysis tool to determine the degrees of separation between actors using the Breadth-First Search (BFS) algorithm. The program processes data from IMDb, computes the shortest path of connections through shared movie roles, and outputs the sequence of movies that connect any two actors.
Write a program that computes the shortest path between any two actors by identifying the sequence of movies that connects them.
The project uses datasets based on IMDb, containing information about actors, movies, and the relationships between them. These datasets include people.csv
, movies.csv
, and stars.csv
, which provide the necessary details to compute the degrees of separation.
- Data Input: Load data from CSV files into memory. The files include
people.csv
,movies.csv
, andstars.csv
. - Data Structures: Use dictionaries to map names to person IDs, person IDs to details (name, birth year, movies), and movie IDs to details (title, year, stars).
- User Input: Prompt the user to input two actor names.
- Shortest Path Calculation: Implement the
shortest_path
function to find the shortest path using BFS. The function returns a list of (movie_id, person_id) pairs representing the path. - Output: Display the sequence of movies that connects the two actors.
$ python degrees.py large
Loading data...
Data loaded.
Name: Emma Watson
Name: Jennifer Lawrence
3 degrees of separation.
1: Emma Watson and Brendan Gleeson starred in Harry Potter and the Order of the Phoenix
2: Brendan Gleeson and Michael Fassbender starred in Trespass Against Us
3: Michael Fassbender and Jennifer Lawrence starred in X-Men: First Class
- Download the distribution code: Download from CS50 AI Degrees project and unzip it.
- Load Data: Run the script with the appropriate data directory, e.g.,
python degrees.py large
.
- Loading Data: The script reads from
people.csv
,movies.csv
, andstars.csv
to build the necessary mappings and sets for people and movies. - Finding Connections: The
shortest_path
function utilizes BFS to explore connections between actors via movies. Each node in the search represents an actor, and edges represent movies connecting actors. - Efficiency Considerations: To optimize, check for the goal when adding nodes to the frontier rather than when removing them. This can reduce unnecessary expansions.
For more details on the project, its background, and specific implementation requirements, visit the CS50 AI 2024 Degrees Project Page.