PyTorch implementation of the paper "Reference-Based Sketch Image Colorization using Augmented-Self Reference and Dense Semantic Correspondence" (CVPR 2020)
- Pytorch
- torchvision
- numpy
- PIL
- OpenCV
- tqdm
- Clone the repository
git clone https://github.com/Snailpong/style_transfer_implementation.git
- Dataset download
- Tag2Pix (filtered Danbooru2020): Link
- You need to change the script 'danbooru2018' to 'danbooru2020' (can be changed)
- In my experiment, I used about 6000 images filtered by
python preprocessor/tagset_extractor.py
- I stopped the process when
0080
folder was finished downloading.
- I stopped the process when
- Sketch image generation
- XDoG: Link
- For automatic genration, I edited main function as follows:
if __name__ == '__main__':
for file_name in os.listdir('../data/danbooru/color'):
print(file_name, end='\r')
image = cv2.imread(f'../data/danbooru/color/{file_name}', cv2.IMREAD_GRAYSCALE)
result = xdog(image)
cv2.imwrite(f'../data/danbooru/sketch/{file_name}', result)
- folder structure example
.
└── data
├── danbooru
| ├── color
| | ├── 7.jpg
| | └── ...
| └── sketch
| ├── 7.jpg
| └── ...
└── val
├── color
| ├── 1.jpg
| └── ...
└── sketch
├── 1.jpg
└── ...
- TPS transformation module
- TPS: Link
- Place
thinplate
folder to main folder
- Train
python train.py
- arguments
- load_model:
True
/False
- cuda_visible:
CUDA_VISIBLE_DEVICES
(e.g. 1)
- load_model:
- Test
-
python test.py
-
arguments
- image_path: folder path to convert the images
- cuda_visible
Sketch | Reference | Result |
- In Eq. (1), I could not scale the number of activation map, instead I scaled activation map into .
- In Eq. (5), I implemented the negative region as same region in different batches since the negative region is ambiguous.
- In Eq. (9), since is unclear in contrast to Eq. (8), I computed style (gram) loss with
relu5_1
activation map. - In this experiment, there was little difference in quality with or without the similarity-based triplet loss. After convergence from 20 to 0 from 1 epoch, there was little change.
- When the test image was predicted every 1 epoch after the content loss was converged, the color quality difference was remarkable.
- The converged adversarial losses of the generator and discriminator were 0.7 ~ 0.8 and 0.15 ~ 0.2, respectively.