This is a work in progress.
Currently I have deduced that the 100x optimization was obtained on optimizing the neighbours
function.
In that function a undirected graph was converted into a directed graph and that lead to the most gain in speed.
Please check out the notebook for more details and thought process. It has a cool heatmap showing where the actual slowness was.
Feel free to contribute!