Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix/config param #1

Open
wants to merge 71 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 68 commits
Commits
Show all changes
71 commits
Select commit Hold shift + click to select a range
039a931
use python instead of python3
savan77 Oct 27, 2020
060677f
store models in the /mnt/output directory
savan77 Oct 28, 2020
5a0087f
add template for nas
savan77 Oct 30, 2020
4e04cb3
place viz logs into /mnt/output
savan77 Oct 30, 2020
63610ed
update paths for tf enas
savan77 Nov 2, 2020
c1c5a31
allow users to specify dataset
savan77 Nov 3, 2020
812bdbb
perform resizing before converting it into tensor
savan77 Nov 3, 2020
d04ab5e
add data processing script
savan77 Nov 5, 2020
da463dd
add generic pytorch classifier
savan77 Nov 7, 2020
7688fd2
train model with specific parameters
savan77 Nov 9, 2020
4ba1420
add workflow template
savan77 Nov 9, 2020
6c6333d
use shutil to move files between different file systems
savan77 Nov 9, 2020
a1e447b
add script for model training with specific parameters
savan77 Nov 9, 2020
4609a4c
split dataset into train and test set
savan77 Nov 9, 2020
3e05a4b
update logic for dataset split
savan77 Nov 9, 2020
d96cd13
add support for vgg and alexnet
savan77 Nov 9, 2020
1a99b51
resolve indentation and subscription issue
savan77 Nov 9, 2020
fb23a16
add alexnet and vgg support for specific param training
savan77 Nov 10, 2020
00ba638
changes to persist metrics
savan77 Nov 10, 2020
fd00e11
add model comparison script
savan77 Nov 10, 2020
d76ac41
update template
savan77 Nov 10, 2020
cf5a6cd
correct typos in template
savan77 Nov 10, 2020
c9a3904
update path for processed data
savan77 Nov 11, 2020
c628e56
handle case when loss is NaN
savan77 Nov 11, 2020
72f491d
store model after every epoch
savan77 Nov 11, 2020
d899e4c
change nas log directory for visualization
savan77 Nov 11, 2020
f085f9e
get best parameter for hyper param tuning
savan77 Nov 12, 2020
f4ac735
update node package version
savan77 Nov 12, 2020
918f773
fixed issue with metrics dumping of tuner
savan77 Nov 12, 2020
e25b239
fixed a typo
savan77 Nov 12, 2020
47f4982
fixed a typo in a path
savan77 Nov 12, 2020
5b048d6
update template
savan77 Nov 12, 2020
603638c
update workflow
savan77 Nov 16, 2020
199aacf
handle case when loss is NaN
savan77 Nov 16, 2020
b19ef2a
reduce default epochs for testing
savan77 Nov 16, 2020
3da6fb7
shut down process when experiment is finished
savan77 Nov 16, 2020
5acfbe4
stop process when experiment is done
savan77 Nov 17, 2020
9265381
accept settings in a single param
savan77 Nov 17, 2020
a89139e
get config args from user
savan77 Nov 17, 2020
c6a7706
add log statements
savan77 Nov 18, 2020
ed43e9c
revert changes related to single param config
savan77 Nov 18, 2020
7e49f77
revert parameter changes
savan77 Nov 18, 2020
2f6f5c7
convert metrics to float explicitly
savan77 Nov 18, 2020
6dad848
read parameters from config
savan77 Nov 18, 2020
4454f50
accept config for enas
savan77 Nov 18, 2020
b4673e7
convert score to float
savan77 Nov 18, 2020
70517e6
add error handling in comparison script
savan77 Nov 18, 2020
c54a626
remove subscription from argparse var
savan77 Nov 18, 2020
e3a8f93
convert args to dictionary
savan77 Nov 18, 2020
3fc027b
assign config back to args
savan77 Nov 18, 2020
ccee20f
convert numerican strings to int
savan77 Nov 18, 2020
b890319
round numbers to 2 decimal points
savan77 Nov 18, 2020
63fb183
add a flag to skip the preprocessing
savan77 Nov 20, 2020
aeeeded
change argument type for skip
savan77 Nov 20, 2020
d85c27c
change argument type to string
savan77 Nov 20, 2020
198997d
update template
savan77 Nov 20, 2020
5ad2cfe
handle case when data is already processed
savan77 Nov 20, 2020
5235b99
add log lines
savan77 Nov 23, 2020
d68b5f7
correct typo in var name
savan77 Nov 23, 2020
50b9016
replace logger with print
savan77 Nov 23, 2020
56df7a8
update template
savan77 Nov 24, 2020
9c6fb6f
handle case when search params aren;t provided
savan77 Nov 29, 2020
1e2f39e
update maximum trials
savan77 Dec 1, 2020
24088cd
update data directory for preprocessing
savan77 Dec 2, 2020
2d29f3f
delete lost+found directories
savan77 Dec 2, 2020
0bf7194
pass model_type to main script
savan77 Dec 3, 2020
686a1b5
allow user to update settings in config.yml
savan77 Dec 4, 2020
48baeb6
resolve key access error
savan77 Dec 4, 2020
1a840b5
add comments
savan77 Dec 8, 2020
346eee1
add comments and support for new models
savan77 Dec 8, 2020
eabc9a6
Merge branch 'master' into fix/config_param
savan77 Dec 8, 2020
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
30 changes: 30 additions & 0 deletions compare.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
import json

accuracies = {}

try:
with open('/tmp/nas-metrics.json') as f:
nas = json.load(f)
print("Metrics for Neural Architecture Search: ", nas)
accuracies['nas_acc'] = [float(i['value']) for i in nas if i['name'] == 'accuracy'][0]
except RuntimeError as e:
print("Error occurred while reading metrics for NAS: ", e)

try:
with open('/tmp/hyperop-metrics.json') as f:
hyper = json.load(f)
print("Metrics for hyper parameter optimization: ", hyper)
accuracies['hyper_acc'] = [float(i['value']) for i in hyper if i['name'] == 'accuracy'][0]
except RuntimeError as e:
print("Error occurred while reading metrics for hyperparameter optimization: ", e)

try:
with open('/tmp/singlemodel-metrics.json') as f:
fm = json.load(f)
print("Metrics for model trained with fixed parameters: ", fm)
accuracies['fm_acc'] = [float(i['value']) for i in fm if i['name'] == 'accuracy'][0]
except RuntimeError as e:
print("Error occurred while reading metrics for fixed-param model: ", e)

max_acc_name = max(accuracies, key=accuracies.get)
print("Maximum accuracy was {} for {}".format(max(accuracies.values()), max_acc_name))
2 changes: 1 addition & 1 deletion examples/nas/enas-tf/search.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
from tensorflow.keras.losses import Reduction, SparseCategoricalCrossentropy
from tensorflow.keras.optimizers import SGD

from nni.nas.tensorflow import enas
from nni.algorithms.nas.tensorflow import enas

import datasets
from macro import GeneralNetwork
Expand Down
18 changes: 17 additions & 1 deletion examples/nas/enas/datasets.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,23 @@

from torchvision import transforms
from torchvision.datasets import CIFAR10
from torchvision.datasets import ImageFolder

def get_custom_dataset(train_dir, valid_dir):
""" Load custom classification dataset using ImageFolder.
The train and test directory should have sub directories with name equals to label names.

def get_dataset(cls):
"""
transform = transforms.Compose([
transforms.Resize((32, 32)),
transforms.ToTensor()
])
train_dataset = ImageFolder(root=train_dir, transform=transform)
valid_dataset = ImageFolder(root=valid_dir, transform=transform)
return train_dataset, valid_dataset


def get_dataset(cls, train_dir=None, valid_data=None):
MEAN = [0.49139968, 0.48215827, 0.44653124]
STD = [0.24703233, 0.24348505, 0.26158768]
transf = [
Expand All @@ -23,6 +37,8 @@ def get_dataset(cls):
if cls == "cifar10":
dataset_train = CIFAR10(root="./data", train=True, download=True, transform=train_transform)
dataset_valid = CIFAR10(root="./data", train=False, download=True, transform=valid_transform)
elif cls == "custom_classification":
dataset_train, dataset_valid = get_custom_dataset(train_dir, valid_data)
else:
raise NotImplementedError
return dataset_train, dataset_valid
54 changes: 35 additions & 19 deletions examples/nas/enas/search.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,17 +2,16 @@
# Licensed under the MIT license.

import logging
import time
from argparse import ArgumentParser

import json
import torch
import torch.nn as nn

import datasets
from macro import GeneralNetwork
from micro import MicroNetwork
from nni.nas.pytorch import enas
from nni.nas.pytorch.callbacks import (ArchitectureCheckpoint,
from nni.algorithms.nas.pytorch import enas
from nni.nas.pytorch.callbacks import (ArchitectureCheckpoint, ModelCheckpoint,
LRSchedulerCallback)
from utils import accuracy, reward_accuracy

Expand All @@ -21,21 +20,35 @@

if __name__ == "__main__":
parser = ArgumentParser("enas")
parser.add_argument("--batch-size", default=128, type=int)
# parser.add_argument("--batch-size", default=128, type=int)
parser.add_argument("--log-frequency", default=10, type=int)
parser.add_argument("--search-for", choices=["macro", "micro"], default="macro")
parser.add_argument("--epochs", default=None, type=int, help="Number of epochs (default: macro 310, micro 150)")
parser.add_argument("--visualization", default=False, action="store_true")
parser.add_argument("--num-classes", default=2, type=int)
parser.add_argument("--dataset", default="cifar10", choices=["cifar10", "custom_classification"])
# parser.add_argument("--search-for", choices=["macro", "micro"], default="macro")
# parser.add_argument("--epochs", default=None, type=int, help="Number of epochs (default: macro 310, micro 150)")
parser.add_argument("--visualization", default=True, action="store_true")
parser.add_argument("--train-data-dir", default="/home/savan/Documents/train_data", help="train dataset for classification")
parser.add_argument("--valid-data-dir", default="/home/savan/Documents/test_data", help="validation dataset for classification")
parser.add_argument("--config", default="batch-size=128 \n search-for=macro \n epochs=30")
args = parser.parse_args()

dataset_train, dataset_valid = datasets.get_dataset("cifar10")
if args.search_for == "macro":
model = GeneralNetwork()
num_epochs = args.epochs or 310
extras = args.config.split("\n")
print("nas extras", extras)
extras_processed = [i.split("#")[0].replace(" ","") for i in extras if i]
print("nas extra processed", extras_processed)
config = {i.split('=')[0]:i.split('=')[1] for i in extras_processed}
print("nas config", config)
config.update(vars(args))
args = config

dataset_train, dataset_valid = datasets.get_dataset(args['dataset'], train_dir=args['train_data_dir'], valid_data=args['valid_data_dir'])
if args['search_for'] == "macro":
model = GeneralNetwork(num_classes=int(args['num_classes']))
num_epochs = int(args['epochs']) or 310
mutator = None
elif args.search_for == "micro":
model = MicroNetwork(num_layers=6, out_channels=20, num_nodes=5, dropout_rate=0.1, use_aux_heads=True)
num_epochs = args.epochs or 150
elif args['search_for'] == "micro":
model = MicroNetwork(num_layers=6, out_channels=20, num_nodes=5, dropout_rate=0.1, num_classes=int(args['num_classes']), use_aux_heads=True)
num_epochs = int(args['epochs']) or 150
mutator = enas.EnasMutator(model, tanh_constant=1.1, cell_exit_extra_step=True)
else:
raise AssertionError
Expand All @@ -49,13 +62,16 @@
metrics=accuracy,
reward_function=reward_accuracy,
optimizer=optimizer,
callbacks=[LRSchedulerCallback(lr_scheduler), ArchitectureCheckpoint("./checkpoints")],
batch_size=args.batch_size,
callbacks=[LRSchedulerCallback(lr_scheduler), ArchitectureCheckpoint("/mnt/output"), ModelCheckpoint("/mnt/output")],
batch_size=int(args['batch_size']),
num_epochs=num_epochs,
dataset_train=dataset_train,
dataset_valid=dataset_valid,
log_frequency=args.log_frequency,
log_frequency=args['log_frequency'],
mutator=mutator)
if args.visualization:
if args['visualization']:
trainer.enable_visualization()
trainer.train()
metrics = [{'name':'accuracy', 'value':round(trainer.val_model_summary['acc1'].avg, 2)}, {'name':'loss', 'value':round(trainer.val_model_summary['loss'].avg,2)}]
with open('/tmp/sys-metrics.json', 'w') as f:
json.dump(metrics, f)
2 changes: 1 addition & 1 deletion examples/trials/mnist-tfv2/config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,6 @@ tuner:
classArgs:
optimize_mode: maximize # choices: maximize, minimize
trial:
command: python3 mnist.py
command: python mnist.py
codeDir: .
gpuNum: 0
21 changes: 21 additions & 0 deletions examples/trials/pytorch-classifier/config.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
authorName: default
experimentName: pytorch_classifier
trialConcurrency: 1
maxExecDuration: 10h
maxTrialNum: 1
#choice: local, remote, pai
trainingServicePlatform: local
searchSpacePath: search_space.json
#choice: true, false
useAnnotation: false
tuner:
#choice: TPE, Random, Anneal, Evolution, BatchTuner, MetisTuner, GPTuner
#SMAC (SMAC should be installed through nnictl)
builtinTunerName: TPE
classArgs:
#choice: maximize, minimize
optimize_mode: maximize
trial:
command: python3 main.py
codeDir: .
gpuNum: 0
25 changes: 25 additions & 0 deletions examples/trials/pytorch-classifier/config_aml.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
authorName: default
experimentName: pytorch_classifier
trialConcurrency: 1
maxExecDuration: 10h
maxTrialNum: 10
trainingServicePlatform: aml
searchSpacePath: search_space.json
#choice: true, false
useAnnotation: false
tuner:
#choice: TPE, Random, Anneal, Evolution, BatchTuner, MetisTuner, GPTuner
#SMAC (SMAC should be installed through nnictl)
builtinTunerName: TPE
classArgs:
#choice: maximize, minimize
optimize_mode: maximize
trial:
command: python3 main.py
codeDir: .
image: msranni/nni
amlConfig:
subscriptionId: ${replace_to_your_subscriptionId}
resourceGroup: ${replace_to_your_resourceGroup}
workspaceName: ${replace_to_your_workspaceName}
computeTarget: ${replace_to_your_computeTarget}
27 changes: 27 additions & 0 deletions examples/trials/pytorch-classifier/config_assessor.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
authorName: default
experimentName: pytorch_classifier
trialConcurrency: 1
maxExecDuration: 10h
maxTrialNum: 50
#choice: local, remote
trainingServicePlatform: local
searchSpacePath: search_space.json
#choice: true, false
useAnnotation: false
tuner:
#choice: TPE, Random, Anneal, Evolution, BatchTuner, MetisTuner, GPTuner
#SMAC (SMAC should be installed through nnictl)
builtinTunerName: TPE
classArgs:
#choice: maximize, minimize
optimize_mode: maximize
assessor:
#choice: Medianstop, Curvefitting
builtinAssessorName: Curvefitting
classArgs:
epoch_num: 20
threshold: 0.9
trial:
command: python3 main.py
codeDir: .
gpuNum: 0
40 changes: 40 additions & 0 deletions examples/trials/pytorch-classifier/config_frameworkcontroller.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
authorName: default
experimentName: pytorch_classifier
trialConcurrency: 1
maxExecDuration: 1h
maxTrialNum: 10
#choice: local, remote, pai, kubeflow
trainingServicePlatform: frameworkcontroller
searchSpacePath: search_space.json
#choice: true, false
useAnnotation: false
tuner:
#choice: TPE, Random, Anneal, Evolution, BatchTuner, MetisTuner, GPTuner
builtinTunerName: TPE
classArgs:
#choice: maximize, minimize
optimize_mode: maximize
assessor:
builtinAssessorName: Medianstop
classArgs:
optimize_mode: maximize
trial:
codeDir: .
taskRoles:
- name: worker
taskNum: 1
command: python3 main.py
gpuNum: 1
cpuNum: 1
memoryMB: 8192
image: msranni/nni:latest
frameworkAttemptCompletionPolicy:
minFailedTaskCount: 1
minSucceededTaskCount: 1
frameworkcontrollerConfig:
storage: nfs
nfs:
# Your NFS server IP, like 10.10.10.10
server: {your_nfs_server_ip}
# Your NFS server export path, like /var/nfs/nni
path: {your_nfs_server_export_path}
32 changes: 32 additions & 0 deletions examples/trials/pytorch-classifier/config_kubeflow.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
authorName: default
experimentName: pytorch_classifier
trialConcurrency: 1
maxExecDuration: 10h
maxTrialNum: 1
#choice: local, remote, pai, kubeflow
trainingServicePlatform: kubeflow
searchSpacePath: search_space.json
#choice: true, false
useAnnotation: false
tuner:
#choice: TPE, Random, Anneal, Evolution, BatchTuner, MetisTuner, GPTuner
builtinTunerName: TPE
classArgs:
#choice: maximize, minimize
optimize_mode: maximize
trial:
codeDir: .
worker:
replicas: 1
command: python3 main.py
gpuNum: 0
cpuNum: 1
memoryMB: 8192
image: msranni/nni:latest
kubeflowConfig:
operator: tf-operator
apiVersion: v1alpha2
storage: nfs
nfs:
server: 10.10.10.10
path: /var/nfs/general
35 changes: 35 additions & 0 deletions examples/trials/pytorch-classifier/config_pai.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
authorName: default
experimentName: pytorch_classifier
trialConcurrency: 1
maxExecDuration: 10h
maxTrialNum: 10
#choice: local, remote, pai
trainingServicePlatform: pai
searchSpacePath: search_space.json
#choice: true, false
useAnnotation: false
tuner:
#choice: TPE, Random, Anneal, Evolution, BatchTuner, MetisTuner, GPTuner
#SMAC (SMAC should be installed through nnictl)
builtinTunerName: TPE
classArgs:
#choice: maximize, minimize
optimize_mode: maximize
trial:
command: python3 main.py
codeDir: .
gpuNum: 0
cpuNum: 1
memoryMB: 8196
#The docker image to run nni job on pai
image: msranni/nni:latest
nniManagerNFSMountPath: {replace_to_your_nfs_mount_path}
containerNFSMountPath: {replace_to_your_container_mount_path}
paiStorageConfigName: {replace_to_your_storage_config_name}
paiConfig:
#The username to login pai
userName: username
#The token to login pai
token: token
#The host of restful server of pai
host: 10.10.10.10
21 changes: 21 additions & 0 deletions examples/trials/pytorch-classifier/config_windows.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
authorName: default
experimentName: pytorch_classifier
trialConcurrency: 1
maxExecDuration: 10h
maxTrialNum: 10
#choice: local, remote, pai
trainingServicePlatform: local
searchSpacePath: search_space.json
#choice: true, false
useAnnotation: false
tuner:
#choice: TPE, Random, Anneal, Evolution, BatchTuner, MetisTuner, GPTuner
#SMAC (SMAC should be installed through nnictl)
builtinTunerName: TPE
classArgs:
#choice: maximize, minimize
optimize_mode: maximize
trial:
command: python main.py
codeDir: .
gpuNum: 0
Loading