AlexNet bacward shape missmatch + ReLu return a tuple #681

Belegkarnil · 2020-04-20T14:02:43Z

Hi,

I have implemented AlexNet in singa but I obtain an error during the backward_and_update instruction. I am using Singa 3.0.0.rc1 on cpu.

This is my AlexNet implementation:
`from singa import autograd
from singa import module
from singa import opt

all = ['AlexNet', 'alexnet']

class AlexNet(module.Module):
def init(self, num_classes=1000):
super(AlexNet, self).init()
# 12 sur GPU donc 6 & 6
self.features1 = [
autograd.Conv2d(3,64,kernel_size=11,stride=4,padding=2),
autograd.ReLU(),
autograd.MaxPool2d(kernel_size=3, stride=2),
autograd.Conv2d(64,192,kernel_size=5,padding=2),
autograd.ReLU(),
autograd.MaxPool2d(kernel_size=3, stride=2),
autograd.Conv2d(192,384,kernel_size=3,padding=1),
autograd.ReLU(),
autograd.Conv2d(384, 256,kernel_size=3,padding=1),
autograd.ReLU()
]
self.features2 = [
autograd.Conv2d(256, 256,kernel_size=3,padding=1),
autograd.ReLU(),
autograd.MaxPool2d(kernel_size=3, stride=2)
]
self.avgpool = autograd.AvgPool2d(6, stride=1)
self.flatten = autograd.Flatten()
self.classifier = [
autograd.Dropout(),
autograd.Linear(256 * 6 * 6, 4096),
autograd.ReLU(),
autograd.Dropout(),
autograd.Linear(4096, 4096),
autograd.ReLU(),
autograd.Linear(4096, num_classes)
]
self.optimizer = opt.SGD(lr=0.001, momentum=0.9)
def loss(self, out, ty):
return autograd.softmax_cross_entropy(out, ty)
def optim(self, loss, dist_option, spars):
if dist_option == 'fp32':
self.optimizer.backward_and_update(loss)
elif dist_option == 'fp16':
self.optimizer.backward_and_update_half(loss)
elif dist_option == 'partialUpdate':
self.optimizer.backward_and_partial_update(loss)
elif dist_option == 'sparseTopK':
self.optimizer.backward_and_sparse_update(loss, topK=True, spars=spars)
elif dist_option == 'sparseThreshold':
self.optimizer.backward_and_sparse_update(loss, topK=False, spars=spars)
def forward(self, x):
for (i,layers) in enumerate([self.features1, self.features2, [ self.avgpool,self.flatten ] , self.classifier]):
for (j,fn) in enumerate(layers):
x = fn(x)
if(type(x) is tuple):# FIXME I have to do that because of a bug in Singa? (ReLU)
x = x[0]
return x

def alexnet(**kwargs):
return AlexNet(**kwargs)
`
And I get : AssertionError: ('shape mismatch', (9216, 4096), (256, 4096))
Which is my first linear layer : 256 * 6 * 6, 4096

When I use my VGG16 implementation, I got a similar error :
AssertionError: ('shape mismatch', (25088, 4096), (512, 4096))

It seems that the backward operation does not map the correct shape to the corresponding layer.

Moreover, the ReLu class return a 1-tuple containing a Tensor. Is it intended or is it a bug?

dcslin · 2020-04-22T04:10:26Z

Hi, as pointed out by @chrishkchris , the convention is to use RELU as stateless layer.
usage:
https://github.com/apache/singa/blob/master/examples/cnn/model/cnn.py#L40

For shape mismatch, you might need to check the shape of layers again. Let me know if further info is required.

Belegkarnil · 2020-04-22T07:06:14Z

Ok, I'll try but why to provide a statefull ReLU Layer? Is it for a specific purpose?

Belegkarnil · 2020-04-23T05:44:07Z

I compared my implementation to other frameworks and it is the same shapes.
Moreover the forward pass does not cause any issue, it is the backward pass.
This is why I suspect a bug. Is it possible?

nudles · 2020-04-23T14:34:02Z

Hi, as pointed out by @chrishkchris , the convention is to use RELU as stateless layer.
usage:
https://github.com/apache/singa/blob/master/examples/cnn/model/cnn.py#L40

For shape mismatch, you might need to check the shape of layers again. Let me know if further info is required.

@dcslin Did you try to run the code pasted by @Belegkarnil ?
Can you reproduce the error?

dcslin · 2020-04-24T01:41:00Z

Hi, as pointed out by @chrishkchris , the convention is to use RELU as stateless layer.
usage:
https://github.com/apache/singa/blob/master/examples/cnn/model/cnn.py#L40
For shape mismatch, you might need to check the shape of layers again. Let me know if further info is required.

@dcslin Did you try to run the code pasted by @Belegkarnil ?
Can you reproduce the error?

I am still checking the code

dcslin · 2020-04-29T07:06:09Z

Hi @Belegkarnil, you might need to change 256 * 6 * 6, 4096 to 256, 4096 to make it works.

Also you are recommended to use relu/dropout/flatten like this https://github.com/apache/singa/blob/master/examples/cnn/model/cnn.py#L40

Belegkarnil · 2020-05-04T07:48:19Z

Ok thanks a lot ! I assumed that it works like other frameworks but that the result of AvgPool has a different shape.

dcslin mentioned this issue Apr 22, 2020

Add Alexnet Example #685

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AlexNet bacward shape missmatch + ReLu return a tuple #681

AlexNet bacward shape missmatch + ReLu return a tuple #681

Belegkarnil commented Apr 20, 2020

dcslin commented Apr 22, 2020

Belegkarnil commented Apr 22, 2020

Belegkarnil commented Apr 23, 2020

nudles commented Apr 23, 2020

dcslin commented Apr 24, 2020

dcslin commented Apr 29, 2020

Belegkarnil commented May 4, 2020

AlexNet bacward shape missmatch + ReLu return a tuple #681

AlexNet bacward shape missmatch + ReLu return a tuple #681

Comments

Belegkarnil commented Apr 20, 2020

dcslin commented Apr 22, 2020

Belegkarnil commented Apr 22, 2020

Belegkarnil commented Apr 23, 2020

nudles commented Apr 23, 2020

dcslin commented Apr 24, 2020

dcslin commented Apr 29, 2020

Belegkarnil commented May 4, 2020