Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support of CuDNN8 #7000

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

artyom-beilis
Copy link

Support of CuDNN8

Some of the API that was used by Caffe was removed in cudnn8. Without it it is impossible to run Caffe on Ampre architecture.

It required:

  • switch to cudnnFind* API instead of cudnnGet* that was removed in version 8.
  • cache search results such that search of the alogrithms happens only in case shape really changed - otherwise reshape costs too much
  • fixed cudnn version search to support cudnn 8
  • added missing error code that was added in version 8

The change was tested on

  • 3070/cuda11.2/cudnn8.1
  • 1080/cuda8/cudnn7
  • 1080/cuda8/cudnn6

- switch to cudnnFind* API instead of cudnnGet* that was removed in 8
- fixed cudnn version search
- search of the alogrithms happens only in case shape really changed
@artyom-beilis
Copy link
Author

Anybody here?

@borisgribkov
Copy link

@artyom-beilis Thanks for your patch! I have tried it the same as this #6970. But encountered with a large memory utilization in case of cudnn8.
After some tests I have tried a model with single conv layer and ( 20 * 3 * 1280 * 720 ) input, it's "head" of ResNet used for detection task. With cuda10 and cudnn7.6 I observed about 1.7Gb usage for a forward pass, for cuda 11 and cudnn8 ~ 2.6Gb. Maybe this comparison is not fully correct, because different GPUs were used, Titan XP in the first case and 3060 for the second.
Have you seen something like this with 3070 and 1080? Thank you!

@artyom-beilis
Copy link
Author

artyom-beilis commented Oct 14, 2021 via email

@borisgribkov
Copy link

borisgribkov commented Oct 14, 2021

AFAIR I noticed the difference in memory use of cudnn7 vs cudnn8 with other frameworks as well.

Could you tell more about other frameworks? I have tried to find similar GPU memory problems mentions but unsuccessful.

@artyom-beilis
Copy link
Author

AFAIR I noticed the difference in memory use of cudnn7 vs cudnn8 with other frameworks as well.

Could you tell more about other frameworks? I have tried to find similar GPU memory usage mentions but unsuccessful.

I don't really remember. It was either pytorch or mxnet. I don't recall. Was long time ago.

@borisgribkov
Copy link

Anyway, thank you! )

…ch, so

switched to cudnnGet*_v7 API instead of much heavier cudnnFind and
query optimal algorithm on _any_ reshape - not ignoring batch size reduction
@kmmanto
Copy link

kmmanto commented Jan 11, 2022

Following this as current caffe I built with nvcr.io/nvidia/cuda:11.4.1-cudnn8-devel-ubuntu20.04 with OpenPose results to a much larger GPU RAM footprint on an AWS G5(Ampere).

@BigMuscle85
Copy link

BigMuscle85 commented Jan 20, 2023

I tried the proposed changes to make cuDNN8 work but it does not work and the training immediately ends with the following error:

I0120 10:25:10.763470 1539595 solver.cpp:60] Solver scaffolding done.
I0120 10:25:10.765404 1539595 caffe.cpp:239] Starting Optimization
I0120 10:25:10.765410 1539595 solver.cpp:292] Solving squeezenet-ssd
I0120 10:25:10.765413 1539595 solver.cpp:293] Learning Rate Policy: poly
F0120 10:25:10.835502 1539595 cudnn_conv_layer.cu:118] Check failed: status == CUDNN_STATUS_SUCCESS (4 vs. 0)  CUDNN_STATUS_INTERNAL_ERROR
*** Check failure stack trace: ***
    @     0x7f09cdf8f1c3  google::LogMessage::Fail()
    @     0x7f09cdf9425b  google::LogMessage::SendToLog()
    @     0x7f09cdf8eebf  google::LogMessage::Flush()
    @     0x7f09cdf8f6ef  google::LogMessageFatal::~LogMessageFatal()
    @     0x7f09ce7753f0  caffe::CuDNNConvolutionLayer<>::Backward_gpu()
    @     0x7f09ce711c6a  caffe::Net<>::BackwardFromTo()
    @     0x7f09ce711da5  caffe::Net<>::Backward()
    @     0x7f09ce6ecaab  caffe::Solver<>::Step()
    @     0x7f09ce6ed492  caffe::Solver<>::Solve()
    @     0x55739e9b4a7a  train()
    @     0x55739e9b1eac  main
    @     0x7f09cd2fb083  __libc_start_main
    @     0x55739e9b290e  _start

Ubuntu 20.04
nVidia GeForce RTX 3060 12 GB
Driver Version: 510.108.03
CUDA Version: 11.6
cuDNN Version: 8.6

Build without cuDNN runs without problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants