User interface of our visual analysis system to explore scene-graph-based visual question answering. Background image: GQA dataset under Creative Commons CC BY 4.0.

GraphVQA Explorer

GraphVQA Explorer is a visual analysis system to explore scene-graph-based visual question answering. It is built on top of the state-of-the-art GraphVQA framework which was trained on the GQA dataset.

Setup instructions (Docker)

Requirements

About 30 to 40 gigabytes of free disk space (*)
Operating system:
- Any recent Linux distribution
- Windows with a WSL2 setup (untested)
- macOS (untested)
Applications:
- Docker (if on Windows, inside WSL)
- git
Up to 1 hour of spare time
- This does not account for the time it takes to download extra files in step 0
- Most of the setup is unattended and just takes some time to execute

(*) data volume breakdown (estimated, base 10 units):

Required files
- Clone of code repository: 40-60MB
- GQA dataset scenegraphs: 363MB
- Auto-generated cache & working data: 4GB
- Resulting Docker image: 5GB
- Space for temporary build files: ?
Optional, highly recommended files
- GQA dataset questions (decompressed): 938MB
  - Note: without these, you won't be able to generate evaluation results with server/generate_eval_data.py.
- Pregenerated evaluation data (decompressed): 780MB
  - Note: without this, the evaluation browser component is not of any use. You can omit this if you plan on generating and supplying your own using server/generate_eval_data.py.
- Pretrained GAT, GCN, GINE & LCGN model parameters (decompressed): 2.6GB
  - Note: without these, generated VQA predictions will not be any useful. You can omit them if you plan on training and supplying your own instead.
- GQA dataset images (decompressed): 22GB
  - Note: without these, you will see a placeholder background image for any scene you inspect. It is recommended to at least source the images of the specific scenes that you wish to work with.

Preliminary

Make sure that you have cloned the repository recursively in order to fetch the required GraphVQA code.
This can be achieved using git's --recursive flag when cloning, i.e. git clone --recursive [...].

To verify that everything is in order, check the contents of server/GraphVQA/ and make sure that the directory is populated.
If this is the case, feel free to proceed.

Please note that steps 0, 1 and 2 can be executed in parallel and in any order. Steps 3 and onwards require the previous step to have been completed.

Step 0: Download recommended files [? hours]

While the project can run without these files, they're highly recommended for a complete user experience.

Download, extract and place the following files accordingly:

Description	Download Size	Link	Target Location
Pretrained model parameters	2.6GB	graphvqa_parameters.7z from https://doi.org/10.18419/darus-3597 (*)	server/dataset/parameters/{model name}/checkpoint.pth
Pregenerated evaluation data	165MB	evaluation_data.7z from https://doi.org/10.18419/darus-3597 (**)	server/dataset/evaluation/{model name}/results_{val,test,train}_balanced_questions.json
GQA dataset questions	1.4GB	https://downloads.cs.stanford.edu/nlp/data/gqa/questions1.2.zip	server/dataset/evaluation/{val,test,train}_balanced_questions.json
GQA dataset images	20.3GB	https://downloads.cs.stanford.edu/nlp/data/gqa/images.zip	server/dataset/images/{id}.jpg

(*): graphvqa_parameters.7z, sha256: cdc2be5c98e701608f7be20cd89cab3b3f42bf82f807c73b2d3c004aff0d2591
(**): evaluation_data.7z, sha256: 1f70217036a60c2c08a930e68dd275be48f7148dd4c4393ab38b06d86455c031

Note: Some of these files may not download quickly in spite of your internet connection.
To get around this, you could use a download accelerator like axel or similar. These work by opening multiple simultaneous connections to the server, each of which download different chunks of the file at the same time.
Keep in mind however that the download speed limitations are likely in place for a reason, and that tools like axel can strain the server.