You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This can be repacked to give a 1 GB file with gzip:1 on glabel and pk_props. The problem is that the 1GB file loads in 18 seconds versus 3 seconds for uncompressed (hdf limitation of single-threaded decompression). To regenerate it from the sparse pixels needed about 1 minute on a 20 core machine. So it is worth keeping this one.
4D merged peaks have ~9 spare columns out of 19 (~50%). It seems risky to write the computed columns as they should change with the parameter updates later.
Reading the uncompressed columnfile is 6 seconds versus recomputing from the pks_table:
%%time
p4d = p0.pk2dmerge( ds.omega, ds.dty )
spat = ImageD11.blobcorrector.eiger_spatial(dxfile=ds.e2dxfile, dyfile=ds.e2dyfile)
cf_4new = ImageD11.columnfile.colfile_from_dict( spat(p4d) )
cf_4new.parameters.loadparameters('LMGO_small_cubic.par')
cf_4new.updateGeometry()
CPU times: user 7.24 s, sys: 1.9 s, total: 9.15 s
Wall time: 4.38 s
2D unmerged peaks have ~9 spare columns out of 19 (~50%) not needed. Also uncompressed. Needs 16 seconds to read:
15G Oct 11 15:00 LMGO_BT6_01_slice_01_peaks_2d.h5
Needed (9): Number_of_pixels dty f_raw fc omega s_raw sc spot3d_id sum_intensity
Results (9): ds eta gx gy gz tth xl yl zl
This can be generated from the pks_table faster than reading (add the time to read the pks_table however, currently 3s).
%%time
cf2d = p0.pk2d( ds.omega, ds.dty )
spat = ImageD11.blobcorrector.eiger_spatial(dxfile=ds.e2dxfile, dyfile=ds.e2dyfile)
cf_new = ImageD11.columnfile.colfile_from_dict( spat(cf2d) )
cf_new.parameters.loadparameters('LMGO_small_cubic.par')
cf_new.updateGeometry()
CPU times: user 17.3 s, sys: 5.5 s, total: 22.8 s
Wall time: 3.2 s
In total, we are writing 25 GB of data that were derived from 2 GB of pixels. To be checked/verified:
how fast is IO compared to computation in general? In a sane universe, we should be able to compute peak properties faster than reading them over the network.
is there a fast HDF5 compression plugin that is better suited for columnfiles? Perhaps blosc.
The text was updated successfully, but these errors were encountered:
We appear to be using up a lot of disk space without much benefit (e.g. IO is slower than compute). It might be a glitch on this specific experiment.
For a real example (hc5590) then all the raw pixels are only 2 GB compressed:
The peaks table is saved without any compression:
This can be repacked to give a 1 GB file with gzip:1 on glabel and pk_props. The problem is that the 1GB file loads in 18 seconds versus 3 seconds for uncompressed (hdf limitation of single-threaded decompression). To regenerate it from the sparse pixels needed about 1 minute on a 20 core machine. So it is worth keeping this one.
4D merged peaks have ~9 spare columns out of 19 (~50%). It seems risky to write the computed columns as they should change with the parameter updates later.
Reading the uncompressed columnfile is 6 seconds versus recomputing from the pks_table:
2D unmerged peaks have ~9 spare columns out of 19 (~50%) not needed. Also uncompressed. Needs 16 seconds to read:
This can be generated from the pks_table faster than reading (add the time to read the pks_table however, currently 3s).
In total, we are writing 25 GB of data that were derived from 2 GB of pixels. To be checked/verified:
The text was updated successfully, but these errors were encountered: