Manual

Here's a quick manual to reproduce our result. Run the next commands from the main.py file.

Mandatory files and directories

4 folders :
- A data dir containing node_information.csv, testing_set.txt, training_set.txt
- A graph_dict dir containing the files 5 npz arrays, if you want to speed up the process and not having to recompute them.
- A variables dir containing npz files if you want to speed up the process, even though it's easier to compute than the graph dictionaries.
- A wv_model dir, where you can store your word embedding models.
3 .py file :
- compute_features.py contains the ComputeFeatures class and many useful functions
- my_nets.py contain some nets used for prediction
- utils.py contain the WordEmbedding and GraphStructure classes.

Get train and test matrices

Build your ComputeFeatures object :
- c = ComputeFeatures() to do it from scratch (~20s with dict loaded). Warning : the load_graph_dict is set to True by default, and will load the dictionaries in the graph_dict directory. If not, please note that the computation is very expensive.
- c = ComputeFeatures.import_from_file(<path>) to load it directly
Get your train matrix with : X_train = c.compute_multiple_variables("all", train=True, scale=<bool>). You can specify which variables you want with a list of handled variables instead of "all", e.g. ["l2_distance", "betweenness"]
Get your test matrix with : X_test = c.compute_multiple_variables("all", train=False)

Example

The following code load the train and test matrices and run and various classifiers:

from sklearn.model_selection import train_test_split
from compute_features import ComputeFeatures
from train import bagging, neural_net, naive_bayes, svc, sgd
from sklearn.preprocessing import scale

c = ComputeFeatures()

X_tot = c.compute_multiple_variables("all", train=True, scale=False)
X_test = c.compute_multiple_variables("all", train=False, scale=True)
X_train, X_validation, y_train, y_validation = train_test_split(X_tot, c.train_array[:, 2])
X_train = scale(X_train)
X_validation = scale(X_validation)

models = []
for f in naive_bayes, bagging, neural_net, svc, sgd:
    models.append(f(X_train, y_train, X_validation, y_validation, False))

If you want to train your model only using specific variables, you just need to do:

my_var = ["l2_distance", "common_neighbors", "degree"]
X_tot =  c.compute_multiple_variables(my_var, train=True, scale=False)

Which can turn to be really useful for making tests.

Other useful functions

In the utils.py file, you can find the plot_distribution_of_prediction, useful to have an idea of how well do your features separate the data. E.g :

plot_distribution_of_prediction(5, 3, c.handled_variables[:15], X_validation, y_validation):

Will plot the distribution of every variables! Nice, right?

From the train.py file, you can run the f1_of_variables or the decrease_of_accuracy functions.

This will print the F1 score obtained by using the bagging classifier on every variable alone, to see how well it fares on the training or validation set (require the computation of a proper validation set).

result = f1_of_variables(c, c.handled_variables)
result_bag = dict()
for k in result.keys():
    result_bag[k] = dict()
    result_bag[k]["80"] = np.mean(result[k]["80"]["bag"])
    result_bag[k]["20"] = np.mean(result[k]["20"]["bag"])
result_bag = pd.DataFrame(result_bag).transpose()

This will print the decrease of accuracy by putting out each feature (using RF properties).

decrease_of_acc(X_train, y_train, <my_var>)

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
data		data
INF554.pdf		INF554.pdf
README.md		README.md
compute_features.py		compute_features.py
main.py		main.py
nets_for_word_embedding.py		nets_for_word_embedding.py
train.py		train.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Manual

Mandatory files and directories

Get train and test matrices

Example

Other useful functions

About

Releases

Packages

Languages

celiaescribe/a_graph_prediction_network

Folders and files

Latest commit

History

Repository files navigation

Manual

Mandatory files and directories

Get train and test matrices

Example

Other useful functions

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages