Please always report problems by filling an issue here on GitHub. This also covers reporting problems with the documentation (e.g. if sth is unclear).
General rules:
- The code style suggestions of PyCharm should be followed.
Job general:
- If a class inherits directly/indirectly from the sisyphus class
Job
the class ends with "Job". - In a Sisyphus job class the output variables start with
out_*
. - If a Job has non-trivial requirements it should have a
self.rqmt
.
Job constructor order:
- set inputs
- set outputs (should be prefixed with
out_
) - define rqmt
Job function order:
__init__
tasks()
- task functions
- helper functions
- class functions
__hash__
Rasr-Jobs:
should always accept crp
, extra_config
+ extra_post_config
Jobs that get a path to a specific software/toolkit (e.g. RETURNN or RASR) should have this
as last parameter in the __init__()
definition.
Import order:
- standard library
- external libraries (including sisyphus, yes I know it's not like that atm)
- recipies
Some general things:
- Don't break a job's hash.
- Really don't break a job's hash.
- Very strong preference to avoid config files / scripts in the recipe repo itself. Everything should be code.
- Jobs have to be deterministic. Especially the code in the Job itself and the computed output (files). If this is not possible at all, please add a clear warning.
sh
function should definitely not be used.
Black formatting:
The check for black formatting is a fixed test case that will automatically run for new pull requests. To prohibit errors before committing, it is recommended to add a git hook that will automatically perform the check before submission.
Create an executable .git/hooks/pre-commit
with:
#!/bin/bash
set -eu
black --check .
exit 0
Corpus variable naming: in sisyphus setups there are different ways to hold the corpus information (str, Path, CorpusObject, RasrConfig). At the moment corpus
covers all. To distinguish the different forms to hold the corpus information, we strongly encourage using the following variable naming:
corpus_key -> str (basically corpus identifier)
corpus_object -> CorpusObject
corpus_file -> Path (would be filepath)
corpus_config -> RasrConfig describing the corpus
corpus any, but usage is discouraged