Imaris Reader: support for LZ4 compression and performance improvements #4249

marcobitplane · 2024-10-24T13:10:04Z

Hello,

this pull request adds support for LZ4 compressed ims files and modifies ImarisHDFReader to avoid multiple reads of the same 3D chunks. See also #4217.

LZ4 support is added using NetCDF-Java's ucar.nc2.filter package, which provides a mechanism for user-supplied filters as described here.

Regarding performance, ImarisHDFReader is modified to have a caching mechanism that reads a stack of planes (as many planes as the chunk z-size) from all channels into a buffer, which only needs to be updated after all data in it has been read. If the size of the buffer would exceed 1GB, the reader falls back to reading the requested plane only. The exact performance improvement will depend on the details of the dataset: in our testing, for 3D datasets the new reader can be over an order of magnitude faster than the existing implementation.

melissalinkert

Thanks, @marcobitplane, this definitely looks like a reasonable approach.

Two higher-level questions before we proceed with more thorough review and testing:

lz4-java is added as a dependency, but the existing dependency on aircompressor via ome-codecs should already support LZ4. Is there a reason to use lz4-java specifically, or could aircompressor be used instead?
The new LZ4Filter uses the ucar.nc2.filter package; if at all possible, I'd prefer to see this in the existing Bio-Formats package structure (loci.formats.*) rather than ucar.nc2.filter. A brief read of https://docs.unidata.ucar.edu/netcdf-java/5.5/userguide/reading_zarr.html#implementing-a-filter suggests the package name of the custom filter itself doesn't matter, as long as it extends/implements the correct class/interface. Could you please either change the package name, or explain a bit why putting LZ4Filter in ucar.nc2.filter is necessary?

Finally, in order to merge this we would need a signed Contributor License Agreement. That's not urgent at this point, but it would be good to have sooner rather than later.

…s/filter

marcobitplane · 2024-11-05T14:22:07Z

Thank you @melissalinkert!

I modified the PR to use aircompressor instead of lz4-java and moved LZ4Filter to loci/formats/filter, let me know if that is not ideal or there's more to change there. I'll send the CLA today.

melissalinkert · 2024-11-11T22:15:35Z

Adding to tonight's build, so we should know in the morning if there are any test failures on existing data.

@marcobitplane : maybe I missed this, but I don't see here or in #4217 a reference to lz4 test files. Do you have any Imaris HDF files with lz4 compression that we can use to test this?

marcobitplane · 2024-11-12T08:25:55Z

Thank you @melissalinkert, here are two lz4-compressed cropped versions of Imaris demo images (retina and CellDemoMembrane3D). Feel free to let me know if you would prefer to have more images or have them shared in a different way.

sbesson · 2024-11-14T17:06:20Z

@marcobitplane thanks for sharing some sample data. Our preferred way to collect such samples is to upload them to the Bio-Formats Zenodo community collection. For the OME team and the community, has the benefit of unambiguously assigning a license for the distribution and re-usage of these samples. In addition, it is possible to add additional samples to the upload and create new versions of the dataset as necessary during the review process.

I had a quick go at testing these with the proposed changed and the Bio-Formats command-line utility. With the just released 8.0.1 command-line tools, Bio-Formats fails with

sbesson@Sebastiens-MacBook-Pro-3 bioformats % ~/Downloads/bftools/showinf -nopix ~/Downloads/demoImagesLz4/CellDemoMembrane3Dlz4.ims
Checking file format [Bitplane Imaris 5.5 (HDF)]
Initializing reader
ImarisHDFReader initializing /Users/sbesson/Downloads/demoImagesLz4/CellDemoMembrane3Dlz4.ims
Exception in thread "main" java.lang.RuntimeException: Unknown filter type=32004
	at ucar.nc2.iosp.hdf5.H5tiledLayoutBB$DataChunk.getByteBuffer(H5tiledLayoutBB.java:227)
	at ucar.nc2.iosp.LayoutBBTiled.hasNext(LayoutBBTiled.java:101)
	at ucar.nc2.iosp.hdf5.H5tiledLayoutBB.hasNext(H5tiledLayoutBB.java:125)
	at ucar.nc2.iosp.IospHelper.readData(IospHelper.java:332)
	at ucar.nc2.iosp.IospHelper.readDataFill(IospHelper.java:292)
	at ucar.nc2.iosp.hdf5.H5iosp.readData(H5iosp.java:161)
	at ucar.nc2.iosp.hdf5.H5iosp.readData(H5iosp.java:134)
	at ucar.nc2.NetcdfFile.readData(NetcdfFile.java:2122)
	at ucar.nc2.Variable.reallyRead(Variable.java:817)
	at ucar.nc2.Variable._read(Variable.java:768)
	at ucar.nc2.Variable.read(Variable.java:600)
	at ucar.nc2.Variable.read(Variable.java:546)
	at loci.formats.services.NetCDFServiceImpl.getArray(NetCDFServiceImpl.java:172)
	at loci.formats.in.ImarisHDFReader.getImageData(ImarisHDFReader.java:473)
	at loci.formats.in.ImarisHDFReader.initFile(ImarisHDFReader.java:308)
	at loci.formats.FormatReader.setId(FormatReader.java:1480)
	at loci.formats.ImageReader.setId(ImageReader.java:864)
	at loci.formats.ReaderWrapper.setId(ReaderWrapper.java:692)
	at loci.formats.tools.ImageInfo.testRead(ImageInfo.java:1048)
	at loci.formats.tools.ImageInfo.main(ImageInfo.java:1159)

With a local build of Bio-Formats using the HEAD of this PR, I have

sbesson@Sebastiens-MacBook-Pro-3 bioformats % ./tools/showinf -nopix  -nopix ~/Downloads/demoImagesLz4/CellDemoMembrane3Dlz4.ims 
Checking file format [Bitplane Imaris 5.5 (HDF)]
Initializing reader
ImarisHDFReader initializing /Users/sbesson/Downloads/demoImagesLz4/CellDemoMembrane3Dlz4.ims
Failure during the reader initialization

adding the -debug flag

sbesson@Sebastiens-MacBook-Pro-3 bioformats % ./tools/showinf -nopix  -nopix ~/Downloads/demoImagesLz4/CellDemoMembrane3Dlz4.ims -debug  
...
loci.formats.FormatException: loci.common.services.ServiceException: java.io.IOException: ucar.nc2.filter.UnknownFilterException: Unknown filter: no filter found with id 32004
	at loci.formats.in.ImarisHDFReader.getImageData(ImarisHDFReader.java:518)
	at loci.formats.in.ImarisHDFReader.initFile(ImarisHDFReader.java:332)
	at loci.formats.FormatReader.setId(FormatReader.java:1480)
	at loci.formats.ImageReader.setId(ImageReader.java:864)
	at loci.formats.ReaderWrapper.setId(ReaderWrapper.java:692)
	at loci.formats.tools.ImageInfo.testRead(ImageInfo.java:1048)
	at loci.formats.tools.ImageInfo.main(ImageInfo.java:1158)
Caused by: loci.common.services.ServiceException: java.io.IOException: ucar.nc2.filter.UnknownFilterException: Unknown filter: no filter found with id 32004
	at loci.formats.services.NetCDFServiceImpl.getArray(NetCDFServiceImpl.java:183)
	at loci.formats.in.ImarisHDFReader.getImageData(ImarisHDFReader.java:514)
	... 6 common frames omitted
Caused by: java.io.IOException: ucar.nc2.filter.UnknownFilterException: Unknown filter: no filter found with id 32004
	at ucar.nc2.internal.iosp.hdf5.H5tiledLayoutBB.<init>(H5tiledLayoutBB.java:90)
	at ucar.nc2.internal.iosp.hdf5.H5iospNew.readData(H5iospNew.java:226)
	at ucar.nc2.internal.iosp.hdf5.H5iospNew.readData(H5iospNew.java:204)
	at ucar.nc2.NetcdfFile.readData(NetcdfFile.java:2122)
	at ucar.nc2.Variable.reallyRead(Variable.java:817)
	at ucar.nc2.Variable._read(Variable.java:768)
	at ucar.nc2.Variable.read(Variable.java:600)
	at ucar.nc2.Variable.read(Variable.java:546)
	at loci.formats.services.NetCDFServiceImpl.getArray(NetCDFServiceImpl.java:175)
	... 7 common frames omitted
Caused by: ucar.nc2.filter.UnknownFilterException: Unknown filter: no filter found with id 32004
	at ucar.nc2.filter.Filters.getFilter(Filters.java:80)
	at ucar.nc2.internal.iosp.hdf5.H5tiledLayoutBB.<init>(H5tiledLayoutBB.java:88)
	... 15 common frames omitted

The other sample retinalz4.ims fails with the same stack trace so there still seems to be some issue with the registration of the custom LZ4 filter.

marcobitplane · 2024-11-18T13:21:06Z

Thank you @sbesson for the clarification, I'll check with our team if it is ok to share the images on Zenodo for usage by the community. I think it should be fine since we already distribute retina.ims and CellDemoMembrane3D.ims with our free ImarisViewer, but they have been modified in the version I shared with you so I'd first like to make sure everything is fine there.

Regarding the error with showinf, apologies, I have been testing the changes to the reader in Fiji (by replacing the formats-gpl jar) and have no issues there with the filter, but I indeed have the same error with the command line tool. I will investigate that and hope to understand the reason for this as soon as possible.

marcobitplane · 2024-11-19T09:43:27Z

The issue with showinf was a conflict between META-INF/services/ucar.nc2.filter.FilterProvider files. Bioformats_package.jar bundles many things including the NetCDF-Java library, which contains its own META-INF/services/ucar.nc2.filter.FilterProvider file providing the default filters like deflate and shuffle. Now we would also have a META-INF/services/ucar.nc2.filter.FilterProvider for the lz4 filter coming from formats-gpl, which creates a conflict since there can be only one file with the same path and name in a jar. Merging the two files seems to me a good solution here, and I read here one way to do it using the maven assembly plugin and Descriptor Handlers. I already committed a fix for the command line issue, let me know if you see problems with this approach.

melissalinkert · 2024-11-19T15:57:36Z

With 2c4dfbd, I still see the same error as in #4249 (comment). The META-INF/services/ucar.nc2.filter.FilterProvider included in bioformats_package.jar looks like this:

$ cat META-INF/services/ucar.nc2.filter.FilterProvider 
ucar.nc2.filter.Blosc$Provider
ucar.nc2.filter.Deflate$Provider
ucar.nc2.filter.Checksum32$Fletcher32Provider
ucar.nc2.filter.Checksum32$Adler32Provider
ucar.nc2.filter.Checksum32$CRC32Provider
ucar.nc2.filter.ScaleOffset$Provider
ucar.nc2.filter.Shuffle$Provider

marcobitplane · 2024-11-19T16:36:04Z

I tried a fresh build of Bio-Formats using the HEAD of this PR and I get:

% cat META-INF/services/ucar.nc2.filter.FilterProvider 
loci.formats.filter.LZ4Filter$LZ4FilterProvider
ucar.nc2.filter.Blosc$Provider
ucar.nc2.filter.Deflate$Provider
ucar.nc2.filter.Checksum32$Fletcher32Provider
ucar.nc2.filter.Checksum32$Adler32Provider
ucar.nc2.filter.Checksum32$CRC32Provider
ucar.nc2.filter.ScaleOffset$Provider
ucar.nc2.filter.Shuffle$Provider

I'm now thinking, I built Bio-Formats with maven for this test, maybe @melissalinkert your build of bioformats_package.jar was an Ant build? I will investigate this first thing tomorrow, if so I'll research a similar merging mechanism for Ant.

melissalinkert · 2024-11-19T16:51:58Z

I'm now thinking, I built Bio-Formats with maven for this test, maybe @melissalinkert your build of bioformats_package.jar was an Ant build? I will investigate this first thing tomorrow, if so I'll research a similar merging mechanism for Ant.

Yes, that's correct, I used ant clean jars tools.

marcobitplane · 2024-11-21T13:06:19Z

Thank you, I committed a fix for the ant build, let me know if you encounter any other issues.

The two lz4-compressed ims demo images have also been uploaded to Zenodo.

melissalinkert

Thanks, @marcobitplane. I can now run showinf on the test files after building bioformats_package.jar with Ant, and see images displayed without error.

I have re-included this for automated testing, together with the new public sample files uploaded to Zenodo. I don't expect any test failures at this point, but we'll know for sure tomorrow.

The few in-line comments I have are fairly minor, but should be addressed before we merge.

melissalinkert · 2024-11-21T16:57:03Z

components/formats-gpl/pom.xml

+      <groupId>io.airlift</groupId>
+      <artifactId>aircompressor</artifactId>
+      <version>0.27</version>
+    </dependency>


I was initially going to suggest removing this and updating ome-codecs instead, but realized we don't have a released version of ome-codecs that includes ome/ome-codecs#39.

In the context of this PR, I think I'm OK with leaving this dependency as-is. Separately, though, and before 8.1.0, we'll need to get ome-codecs released and updated so that we don't have two different versions of aircompressor.

components/formats-gpl/src/loci/formats/filter/LZ4Filter.java

components/formats-gpl/src/loci/formats/services/NetCDFServiceImpl.java

…rray.

marcobitplane · 2024-11-22T08:56:07Z

Thank you again for the help and feedback, I added a few commits to address the requested changes.

melissalinkert

Latest changes look good to me, and most recent build is passing: https://merge-ci.openmicroscopy.org/jenkins/job/BIOFORMATS-test-repo/264/

sbesson

Thanks @marcobitplane for sharing the samples using the Zenodo community (https://zenodo.org/records/14197622). We have copied them to our curated QA repository so that they can be integrated in the nightly CI repository tests. As indicated by @melissalinkert above, these have been passing for the last execution under all Java LTS versions.

Thanks for the changes to the Ant build infrastructure. I don't know off-hand of a better solution that to manually unzip and combine as you have done. Confirmed that the new filter is now included in the JAR bundle created either by Maven or Ant

sbesson@Sebastiens-MacBook-Pro-3 bioformats % unzip -qc artifacts/bioformats_package.jar META-INF/services/ucar.nc2.filter.FilterProvider
ucar.nc2.filter.Blosc$Provider
ucar.nc2.filter.Deflate$Provider
ucar.nc2.filter.Checksum32$Fletcher32Provider
ucar.nc2.filter.Checksum32$Adler32Provider
ucar.nc2.filter.Checksum32$CRC32Provider
ucar.nc2.filter.ScaleOffset$Provider
ucar.nc2.filter.Shuffle$Provider
loci.formats.filter.LZ4Filter$LZ4FilterProvider
sbesson@Sebastiens-MacBook-Pro-3 bioformats % unzip -qc  components/bundles/bioformats_package/target/bioformats_package-8.1.0-SNAPSHOT.jar  META-INF/services/ucar.nc2.filter.FilterProvider
loci.formats.filter.LZ4Filter$LZ4FilterProvider
ucar.nc2.filter.Blosc$Provider
ucar.nc2.filter.Deflate$Provider
ucar.nc2.filter.Checksum32$Fletcher32Provider
ucar.nc2.filter.Checksum32$Adler32Provider
ucar.nc2.filter.Checksum32$CRC32Provider
ucar.nc2.filter.ScaleOffset$Provider
ucar.nc2.filter.Shuffle$Provider

On the functional testing of the LZ4 compression, I confirm that the two sample LZ4 compressed Imaris files cannot be opened with Bio-Formats 8.0.1 with a RuntimeException due to an unknown filter. With this PR included both samples can be initialized using the command-line utility and the binary data can be read and compared to the ground truth as shown in Imaris Viewer.

Do you have specific samples and testing scenarios that would allow us to test the performance/caching improvements?

sbesson · 2024-11-27T09:36:33Z

components/formats-gpl/src/loci/formats/in/ImarisHDFReader.java

+        int[] idcs = new int[] {blockNumber * blockSizeZPerResolution[resolutionIndex], 0, 0};
+        try {
+          String path;
+          for (int ch = 0; ch < getSizeC(); ch++) {


While I understand the rationale for caching an entire block for a given channel given the way the data is stored, my understanding is that different channels are stored separately.

While there are scenarios where preloading all channels makes sense, would this add some unnecessary overhead in some others e.g. when only one channel is requested or in the case of a large number of channels where only a few are rendered?

Different channels are indeed stored separately.

If, when reading an image, planes were requested sequentially from a single channel (therefore moving along z before moving along channels) then the buffer could simply hold a stack of planes from that single channel. However, in my testing i saw that Bio-Formats (or Fiji?) requests the same image plane for all channels before moving to the next plane (moving along channels before moving along z). If the buffer just cached a stack from a single channel, the buffer would need to be updated every time, making it useless.

Let me know if I misunderstood your comment

The order in which planes are being retrieved including the index of the channels to be retrieved is fundamentally the decision of the caller. In the case of Fiji, I assume a single reader will indeed iterate through all the active channels for the given timepoint/Z-section before moving onto the next ones.

Other applications might make different decisions or even use concurrent requests in which case the benefit of caching all the channels might be limited. We will need to assess the cost/benefit of the caching logic across these scenarios as part of the testing.

marcobitplane · 2024-11-27T13:27:04Z

Thank you @sbesson!

With any 3D dataset you should be able to see a significant performance boost, for example an image of a few GBs should be read in a few seconds instead of a few minutes. How much faster depends on the chunk z size. For smaller images it might not be as obvious, I can look to provide an example of a larger ims file if you don't have one in the QA repository, just let me know.

sbesson · 2024-11-28T10:31:25Z

For smaller images it might not be as obvious, I can look to provide an example of a larger ims file if you don't have one in the QA repository, just let me know.

If you could share some sample IMS files that you have used for validating the cache improvements, it would be really useful to review these changes.

marcobitplane · 2024-11-28T12:13:40Z

I uploaded an example dataset to Zenodo (I created a new submission and set the access to restricted since this is not one of our demo images and I would again have to ensure it can be uploaded to a public repository). It is a typical 200MB, 3D image with two channels which should be read about 10 times faster with the caching mechanism.

marcobitplane · 2024-12-02T08:56:59Z

The new dataset on Zenodo is now open access.

sbesson

The latest uploaded sample image, now available at https://zenodo.org/records/14235726, was used for testing the caching improvements. Measurements were made using the command-line tools using the different options available for reading different ranges of data.

Plane read times were compared between Bio-Formats built with this PR and Bio-Formats 8.0.1 using the average/standard deviation computed from three consecutive executions of the showinf command. In all scenarios, the reader setId time was unmodified by this PR. The output of the command-line utility e.g. 2.825s elapsed (14.413265ms per plane) was used as the source of the times.

	Bio-Formats 8.0.1	This PR
all planes / first resolution (no options)	17.74s +/- 0.21s	3.41s +/- 0.08s
1/4 of all planes / first resolution (-crop 512,512,512,512)	6.95s +/ 0.02s	2.84s +/- 0.03s
first plane / first resolution (-range 0 0)	0.289s +/- 0.003s	0.172s +/- 0.001s
first plane / second resolution (-noflat -resolution 1 -range 0 0)	0.200s +/- 0.001s	0.195s +/- 0.003s
51st plane / first resolution (-range 50 50)	0.264s +/- 0.001s	0.341s +/- 0.007s
101st plane / first resolution (-range 100 100)	0.299s +/- 0.003s	0.182s +/- 0.004s
last plane / first resolution (-range 195 195)	0.221s +/- 0.005s	0.279s +/- 0.001s

As shown in the two first rows, the benefit of the caching is most visible in the scenario where multiple / all planes are being read using the same initialized reader. In the most advantageous scenario, each block will only be read once and the internal cache re-used for all subsequent reads. In practice, this should be the behavior of the Fiji/ImageJ Bio-Formats plugin when using the Hyperstack view without the Use virtual stack option selected since the stack order is XYCZT.

For individual planes, the read times with this PR included are also consistent with the measurements using the latest release of Bio-Formats taking ~200ms per plane. It is worth noting the table above show an increase in the read times variance with this PR included. Although this might result from the different way the underlying netcdf library is used, my proposal would be to follow-up with a more systematic measurement of the openBytes time for every plane index of the sample dataset.

marcobitplane added 6 commits October 24, 2024 09:48

Added LZ4 filter provider for NetCDF-Java

40ceefa

Add lz4-java dependency to pom.xml

37cba60

Update NetCDFServiceImpl.java to use the new version of NetCDF-Java

8941acc

Add caching to ImarisHDFReader.java

db73caf

Clean up LZ4Filter.java

60a1cfe

Clean up LZ4Filter.java formatting

2afc620

melissalinkert reviewed Nov 4, 2024

View reviewed changes

marcobitplane added 4 commits November 5, 2024 14:46

Aircompressor dependency instead of lz4-java

d15e633

Delete components/formats-gpl/src/ucar directory

4760003

Add filter directory and LZ4Filter

a550ffe

Update ucar.nc2.filter.FilterProvider to the new location loci/format…

32bcfcc

…s/filter

melissalinkert self-requested a review November 5, 2024 14:43

melissalinkert added the include label Nov 11, 2024

melissalinkert removed the include label Nov 14, 2024

Update assembly.xml to ensure META-INF/services files are merged

2c4dfbd

marcobitplane added 2 commits November 21, 2024 10:37

Ant build fix: add ucar.nc2.filter.FilterProvider to formats-gpl jar

f0a0711

Ant build fix: merge ucar.nc2.filter.FilterProvider files

58e5dfd

melissalinkert added the include label Nov 21, 2024

melissalinkert added this to the 8.1.0 milestone Nov 21, 2024

melissalinkert requested changes Nov 21, 2024

View reviewed changes

melissalinkert requested a review from sbesson November 21, 2024 17:13

marcobitplane added 3 commits November 22, 2024 09:42

Add copyright statement to LZ4Filter.java

860b334

Throw UnsupportedOperationException instead of just returning input a…

fc379cc

…rray.

Remove unnecessary import.

50de137

melissalinkert approved these changes Nov 22, 2024

View reviewed changes

sbesson reviewed Nov 27, 2024

View reviewed changes

sbesson reviewed Dec 2, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Imaris Reader: support for LZ4 compression and performance improvements #4249

Imaris Reader: support for LZ4 compression and performance improvements #4249

marcobitplane commented Oct 24, 2024

melissalinkert left a comment

marcobitplane commented Nov 5, 2024

melissalinkert commented Nov 11, 2024

marcobitplane commented Nov 12, 2024

sbesson commented Nov 14, 2024

marcobitplane commented Nov 18, 2024

marcobitplane commented Nov 19, 2024

melissalinkert commented Nov 19, 2024

marcobitplane commented Nov 19, 2024

melissalinkert commented Nov 19, 2024

marcobitplane commented Nov 21, 2024

melissalinkert left a comment

melissalinkert Nov 21, 2024

marcobitplane commented Nov 22, 2024

melissalinkert left a comment

sbesson left a comment

sbesson Nov 27, 2024

marcobitplane Nov 27, 2024

sbesson Nov 28, 2024

marcobitplane commented Nov 27, 2024

sbesson commented Nov 28, 2024

marcobitplane commented Nov 28, 2024

marcobitplane commented Dec 2, 2024

sbesson left a comment

Imaris Reader: support for LZ4 compression and performance improvements #4249

Are you sure you want to change the base?

Imaris Reader: support for LZ4 compression and performance improvements #4249

Conversation

marcobitplane commented Oct 24, 2024

melissalinkert left a comment

Choose a reason for hiding this comment

marcobitplane commented Nov 5, 2024

melissalinkert commented Nov 11, 2024

marcobitplane commented Nov 12, 2024

sbesson commented Nov 14, 2024

marcobitplane commented Nov 18, 2024

marcobitplane commented Nov 19, 2024

melissalinkert commented Nov 19, 2024

marcobitplane commented Nov 19, 2024

melissalinkert commented Nov 19, 2024

marcobitplane commented Nov 21, 2024

melissalinkert left a comment

Choose a reason for hiding this comment

melissalinkert Nov 21, 2024

Choose a reason for hiding this comment

marcobitplane commented Nov 22, 2024

melissalinkert left a comment

Choose a reason for hiding this comment

sbesson left a comment

Choose a reason for hiding this comment

sbesson Nov 27, 2024

Choose a reason for hiding this comment

marcobitplane Nov 27, 2024

Choose a reason for hiding this comment

sbesson Nov 28, 2024

Choose a reason for hiding this comment

marcobitplane commented Nov 27, 2024

sbesson commented Nov 28, 2024

marcobitplane commented Nov 28, 2024

marcobitplane commented Dec 2, 2024

sbesson left a comment

Choose a reason for hiding this comment