ZipROFS is a FUSE file-system that acts as pass through to another FS except it expands zip files like folders and allows direct transparent access to the contents.
We created a branch of ZipROFS to adopt it for the needs of mass spectrometry software. Our mass spectrometry records are stored in ZIP files:
├── brukertimstof │ └── 202302 │ ├── 20230209_hsapiens_Sample_001.d.Zip │ ├── 20230209_hsapiens_Sample_002.d.Zip │ └── 20230209_hsapiens_Sample_003.d.Zip...
With the original version of ZipROFS we would see folders ending with .d.Zip. However, the software requires folders ending with .d like this:
├── brukertimstof │ └── 202302 │ ├── 20230209_hsapiens_Sample_001.d │ │ ├── analysis.tdf │ │ └── analysis.tdf_bin │ ├── 20230209_hsapiens_Sample_002.d │ │ ├── analysis.tdf │ │ └── analysis.tdf_bin │ └── 20230209_hsapiens_Sample_003.d │ ├── analysis.tdf │ └── analysis.tdf_bin
A current problem is that computation is slowed down with ZipROFS compared to conventional file systems.
The reason lies within the closed source shared library timsdata.dll. Reading proprietary mass spectrometry files with this library creates a huge amount of file system requests. These many requests have to pass the user-space-kernel boundary. Another reason for reduced performance is that file reading is not sequential.
To solve the performance problem, we
-
Re-implement ZipROFS using the language C: ZIPsFS.
-
Catching calls to the file API using the LD_PRELOAD technique. Filtering the calls and implementing a cache for directory listings: cache_readdir_stat
- FUSE
- fusepy
- Read only
- Nested zip files are not expanded, they are still just files
To mount run ziprofs.py:
$ ./ziprofs.py ~/root ~/mount -o allowother,cachesize=2048
Example results:
$ tree root
root
├── folder
├── test.zip
└── text.txt
$ tree mount
mount
├── folder
├── test.zip
│ ├── folder
│ │ ├── emptyfile
│ │ └── subfolder
│ │ └── file.txt
│ ├── script.sh
│ └── text.txt
└── text.txt
You can later unmount it using:
$ fusermount -u ~/mount
Or:
$ umount ~/mount
Full help:
$ ./ziprofs.py -h
usage: ziprofs.py [-h] [-o options] [root] [mountpoint]
ZipROFS read only transparent zip filesystem.
positional arguments:
root filesystem root (default: None)
mountpoint filesystem mount point (default: None)
optional arguments:
-h, --help show this help message and exit
-o options comma separated list of options: foreground, debug, allowother, async, cachesize=N (default: {})
foreground
and allowother
options are passed to FUSE directly.
debug
option is used to print all syscall details to stdout.
By default ZipROFS disables async reads to improve performance since async syscalls can
be reordered in fuse which heavily impacts read speeds.
If async reads are preferable, pass async
option on mount.
cachesize
option determines in memory zipfile cache size, defaults to 1000