In some segmentation tasks of Kaggle, it will use run-length
encoding on pixel values for compressing prediction files. Both TGS and DSB use this standard. It is a greate idea to use python implement in some Kaggle Kernels, such as unet-with-depth. But as the number of testing data growing larger, these python codes will become bottleneck of your system, which slow down your submisson. Here is a cython accelerated mask-to-runlength code. Hope it will help Kagglers.
Clone this repository:
$ git clone https://github.com/princewang1994/cython-run-length.git
For using this code, it need some python packages, run following command to install dependencies.
$ pip install -r requirements.txt
Cython code need compiled with setup.py, run make
to build it:
$ cd rlen
$ make
You can use from rlen import make_submission
in your projects to conver your prediction to run-length format. Here is declaration:
def make_submission(preds, names, fast=True, path='submission.csv')
preds
: (list of np.array), [pred1, pred2, ...] each sized [H, W]names
: (list), [name1, name2, ...]fast
: (bool), flag of using Cython acceleratename
: (str), path of submission csv file, default = 'submission.csv'
test_rlen.py
is an example of using make_submission
, before testing it, make sure that tqdm
and pandas
is install again. Then it need a train.pkl
, you can download from https://drive.google.com/file/d/1op6WD4X91uWqf-FLI7b7X0AQNV2wGyC6/view?usp=sharing, and put it on $RLEN_HOME
.
Attention: for using this train.pkl, please make sure that you are using python3, or you should make your own pickle following tgs_pickle.ipynb
- run
test_rlen.py
, it will generateslow_submission.csv
andfast_submission.csv
, see that fast version accelerates it near 200 times.
$ cd $RLEN_HOME
$ python test_rlen.py
100%|█████████████████████████████████████████████████| 4000/4000 [00:00<00:00, 14292.84it/s]
Exporting to fast_submission.csv.
Done.
100%|████████████████████████████████████████████████████| 4000/4000 [00:51<00:00, 77.36it/s]
Exporting to slow_submission.csv.
Done.
- Then test whether they have the same contents:
diff slow_submission.csv fast_submission.csv
if there is no output, done.
- Prince Wang: http://blog.prince2015.club
- Github: https://github.com/princewang1994
This project is licensed under the MIT License - see the LICENSE file for details