Replication track in NeurIPS 2019 Reproducibility Challenge of Learning where to look: Semantic-Guided Multi-Attention Localization for Zero-Shot Learning
This project is not finished yet, more information will be updated soon!
- VGG19(?) backbone
input: Image
output: Features representation - K-means
input: Feature representation
output: 2 groups of feature representation - Global averge pooling + 2 Fully connected layers(ReLU) + Sigmoid
input: 2 groups of feature representation
(intermediate result: Channel descriptor p1, p2)
output: Channel-wise attention weight vector a1, a2 - Weighted-sum
input: Feature representation, channel-wise attention weight
output: 2 Attention maps
- f_CNet(2 fully connected layers)
input: Attention maps
output: [t_x, t_y, t_s] - Boxcar Mask (cropping operation x o V_i)
input: Attention maps, [t_x, t_y, t_s]
output: masked images (x_i^part)
- VGG backbone + Global average pooling
input: original image/ masked image
output: visual feature vector$\theta$
- Transformation:
input: visual feature vector
output: Semantic feature vector
different between seen and unseen classes