A tool to facilitate mentor matching.
Started by James Hulett at mentor matching February 2018. Further developed by Scott Numamoto and Vivien Nguyen in 2019. Yet more development by James Hulett in January 2020 and January 2021.
Usage
Mentor Data Format
Team Data Format
How To Modify the Input Format
How To Change the Weights
How To Modify the Convex Program
Description of the Convex Program
TODOs
-
Run
pip install -i https://pypi.gurobi.com gurobipy
. -
Follow the instructions here under "Individual Academic Licenses" to sign up for a (free) academic license for Gurobi.
-
Put mentor data in a file called
mentors.csv
and team data in a file calledteams.csv
. Data should be formatted as described in the next two sections. Seementors-example.csv
andteams-example.csv
for example data formatting. -
Ensure that there are no commas in any of the data. Commas may cause the csv to be parsed incorrectly.
-
Run
assign.py
. The matching will be output tomatching.csv
; a mentor-team compatibility matrix will be output tocompatibility.csv
. -
On finishing,
assign.py
will print out the value of the solution it found. If this value is negative, you should manually check the matching to see what's going on and if it needs fixing; it probably means that either (i) a mentor was assigned to a team that they have insufficient time overlap with, (ii) a mentor was not assigned to a team they were required to be assigned to, or (iii) mentors who were required to be assigned together are not. If this does happen, the two most likely culprits are either (i) a mentor was required to be paired with a team they have insufficient time overlap with (fix by removing that requirement, or just ignore it if we know it won't be an issue), or (ii) there is no matching such that every mentor is paired with a team they have sufficient time overlap with (no easy fix, other than potentially bugging mentors / teams to give us more availabilities to work with).
There should be one header row (which will be ignored), and every row thereafter should correspond to a mentor.
The columns should be organized as follows:
- 1 column for mentor name.
- 168 columns for availability, where a 1 represents the mentor being available in that time slot and a 0 represents them being unavailable.
- 4 columns for which team type(s) (eg new, small coach presence, etc) the mentor would like to work with, where a 1 represents the mentor wanting to work with that team type and a 0 represents them not.
- 1 column for team(s) this mentor would like to be matched with, separated by a semicolon if there are multiple (blank if none). Any such names must appear exactly as they do in
teams.csv
. - 1 column for team(s) this mentor must be matched with, separated by a semicolon if there are multiple (blank if none). Any such names must appear exactly as they do in
teams.csv
. - 1 column for other mentor(s) this mentor would like to be matched with, separated by a semicolon if there are multiple (blank if none). Any such names must be exactly the same as the name given in the first column of the other mentor(s)' row.
- 1 column for other mentor(s) this mentor must be matched with, separated by a semicolon if there are multiple (blank if none). Any such names must be exactly the same as the name given in the first column of the other mentor(s)' row.
- 1 column for comfort mentoring alone (must be a number from 1 to 5, with 5 as most confident).
- 1 column for convenience of different transit types (each must be "Not possible", "Inconvenient", or "Convenient").
- 2 columns for confidence in skills (each must be "Not Confident", "Somewhat", "Neutral", "Confident", or "Very Confident").
In cases where there are multiple columns (ie, availability, transit conveniences, etc), the columns must be in the same order as in teams.csv
.
There should be one header row (which will be ignored), and every row thereafter should correspond to a team.
The columns should be organized as follows:
- 1 column for team name.
- 168 columns for availability, where a 1 represents the team being available in that time slot and a 0 represents them being unavailable.
- 4 columns for which team type (eg new, small coach presence, etc) this team is, where a 1 represents the team being that type and a 0 represents the team not being that type.
- 1 column for how good/bad it would be for this team to get only one mentor (must be "Bad", "Neutral", or "Good")
- 1 column for how long each travel method would take, as an integer number of minutes. If the team plans on working with their mentor(s) on the Berkeley campus, put in 0 for these columns.
- 2 columns for how much the team wants the different skills (each must be a number from 1 to 5, with 1 as most requested).
In cases where there are multiple columns (ie, availability, transit times, etc), the columns must be in the same order as in mentors.csv
.
-
If you want to modify what the values in a column type should look like or how many columns of that type there should be, change the appropriate variable in
utils.py
, update the lines in Mentor Data Format and Team Data Format, and update the format inmentors-example.csv
andteams-example.csv
. Note that all values are read from the csv as strings. -
If you want to remove a column type, delete
- The corresponding variables in
utils.py
. - The lines in the
__init__
functions for theMentor
andTeam
classes (inutils.py
) that read in that column type. - The name and description of any corresponding attributes in the comments above the
Mentor
andTeam
classes. - The corresponding line(s) in Mentor Data Format and Team Data Format.
- The corresponding column(s) in
mentors-example.csv
andteams-example.csv
.
- The corresponding variables in
-
If you want to add a column type, add
- Any necessary new variables in
utils.py
(ie, number of columns, possible values, etc), with comments for what each is for. Note that all values are read from the csv as strings. - Lines in the
__init__
functions for theMentor
andTeam
classes (inutils.py
) to read in that column type. - The name and description of any corresponding attributes in the comments above the
Mentor
andTeam
classes. - Corresponding line(s) in Mentor Data Format and Team Data Format.
- Corresponding column(s) in
mentors-example.csv
andteams-example.csv
.
- Any necessary new variables in
-
If you want to change the order of the columns,
- Rearrange the corresponding blocks in the
__init__
functions for theMentor
andTeam
classes (inutils.py
). - Rearrange the corresponding lines in Mentor Data Format and Team Data Format.
- Rearrange the columns in
mentors-example.csv
andteams-example.csv
.
- Rearrange the corresponding blocks in the
-
If you just want to change the weight given to some parameter, update the corresponding variable in
utils.py
. -
If you want to modify how a component of the mentor-team compatibility score is calculated, modify the corresponding function in
utils.py
. -
If you want to add / delete a component of the mentor-team compatibility score, add / delete the corresponding function in
utils.py
and the corresponding block ingetTeamCompatibility
(inutils.py
). -
If you want to change how many mentors are assigned to each team, modify
minNumMentors
andmaxNumMentors
inutils.py
.
- Variables, constraints, and the objective function are each created in their own block in
assign.py
.- Variables should be added to the list
variables
, as well as to the dictionariesvarByType
,varByMentor
,varByTeam
,varByPair
, andgroupByVar
(where appropriate) for easy access later. - Constraints should be appended to the list
constraints
. - Terms in the objective function should be appended to the list
objectiveTerms
. The objective function is just the sum of all these terms.
- Variables should be added to the list
- If you modify the structure of the program, please update Description of the Convex Program accordingly.
This section describes the convex program that is used to find a matching. Hopefully no one will ever have to read this.
Variables:
- One boolean variable for each mentor-team pair, representing if that mentor is paired with that team (independent of co-mentors)
- One boolean variable for each team, representing if that team has only one mentor
- One boolean variable for each mentor-team pair, representing if that mentor is paired with that team and is alone
- One boolean variable for each mentor-mentor-team group, representing if both those mentors are paired with the team
Constraints:
- The sum of a mentor's type 1 variables must equal 1. This ensures that every mentor is paired with exactly one team.
- The sum of a team's type 1 variables must be between
utils.minNumMentors
andutils.maxNumMentors
. This ensures that every team is paired with an appropriate number of mentors. - Letting M be the number of mentors, each team's type 2 variable must be less than or equal to (1/M) * (M + 1 - the sum of the team's type 1 variables). This ensures that every team's type 2 variable is set to zero if it has more than one mentor assigned.
- Each team's type 2 variable must be greater than or equal to (2 - the sum of the team's type 1 variables). This ensures that every team's type 2 variable is set to 1 if it has one mentor. This will break if the sum is zero, but that cannot happen so long as
utils.minNumMentors
is strictly greater than zero. - The sum of a team's type 3 variables is equal to its type 2 variable. This ensures that type 3 variables are used if and only if there is exactly one mentor assigned to the team.
- Each variable of type 3 is less than or equal to the corresponding variable of type 1. This ensures that type 3 variables are only used when the corresponding mentor and team are actually paired.
- Letting M be the number of mentors, the sum of a mentor-team pair's type 4 variables is at most M times its type 1 variable. This ensures that a type 4 variable can only be set to 1 if both corresponding mentors are assigned to the corresponding team.
Terms in the Objective Function:
- For each type 1 variable, we have the value of that mentor-team matching (independent of co-mentors) times the variable.
- For each type 3 variable, we have the value of the mentor gives the team alone times the variable.
- For each type 4 variable, we have the value the two mentors give the team together times the variable.
- For each pair of mentors that must be together, subtract
utils.mentorRequiredValue
. Similarly, for each mentor that must be with a specific team, subtractutils.teamRequiredValue
. Note that these offsets are independent of the solution, and so won't change the optimum; their only purpose is to make it such that solutions that don't satisfy all requirements have a negative value, making it easier to spot if this happens.
Note that based on how the constraints are set up, there is nothing requiring type 4 variables to be set to 1. Hence, we need to ensure that type 4 variables can only give positive value to the program. In particular, this means that the cost for not having time overlaps between a mentor and a school have to be charged to the type 1 variables, not to the type 3/4 ones. Additionally, note that the type 7 constraints allow us to set all type 4 variables to 1 provided that both corresponding mentors are assigned to the corresponding team. Hence, the value we get from type 3 objective function terms grows quadratically with the number of mentors assigned to a team. For this reason, it is recommended that utils.minNumMentors
and utils.maxNumMentors
differ by at most 1. If the difference is larger than 1, the program will likely prefer assignments that give some teams many mentors and other teams few mentors, whereas we would prefer it to assign all teams an approximately equal number of mentors.
-
Calculate the amount of availability overlap between a team and two mentors in a less overly-optimistic way.
-
Put a cap on the amount of availability overlap that can be counted.
-
Expand the number of classifications for how good / bad it is for a team to have a single mentor