Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Plan to consolidate Nexus loading code #38332

Open
robertapplin opened this issue Oct 31, 2024 · 4 comments
Open

Plan to consolidate Nexus loading code #38332

robertapplin opened this issue Oct 31, 2024 · 4 comments
Labels
Epic Maintenance Unassigned issues to be addressed in the next maintenance period.

Comments

@robertapplin
Copy link
Contributor

robertapplin commented Oct 31, 2024

Describe the outcome that is desired.
Mantid currently uses multiple mechanisms to load in data. This is outlined in the following issue mantidproject/roadmap#21

These mechanisms primarily involve using the Nexus API library to load data, however this library is now unsupported. We therefore no longer want to use it, and it would be preferable to use the HDF5 API directly. This would provide us with several other benefits which were not previously available when using Nexus API, and could result in a performance improvement when loading data.

To do this switch, the following plan was discussed with @peterfpeterson and @martyngigg .

Describe any solutions you are considering

  1. Move H5Util into the nexus subpackage Move H5Util file to Mantid::Nexus library #38406
  2. Pull the Nexus c-API and cpp-API binding into the Framework/Nexus location in Mantid. This will not include anything other than these two languages Pull nexus library into mantid #38455
    1. The developer package and conda recipe will need to be modified to replace nexus as a requirement with hdf4, hdf5, and whatever else is needed to make the moved version work
    2. #import statements in files will need to be updated
    3. cmake will need changes
    4. Rather than template napi.h.in to allow for cmake to fill in the version, create a variant napi.h which has the version number that mantid bases its changes from
  3. Remove XML, Fortran, etc. sections of this code (both files and #ifdef sections) and re-evaluate if any conda dependencies can be removed
  4. Move the two Muon HDF4 algorithms (LoadMuonNexus1 and LoadMuonNexus2) to the Framework/Muon library to isolate them
  5. Use HDF4 directly in the two Muon HDF4 algorithms rather than using the Nexus API. (Link HDF4 to the Muon library, and add HDF4 to the conda environment)
  6. Remove HDF4 files and #ifdef sections of the Nexus API code within the Framework/Nexus location
  7. Move code from napi5.c into napi.c to remove a layer of abstraction. This may be better to combine with the next step.
  8. Move the C code of napi.c into NeXus::File.cpp
  9. Review the old wrappers such as NexusFileIO and NexusIOHelper to understand how they are used, and where they are used. NexusFileIO can likely be removed/refactored as it is only used by the SaveNexusProcessed algorithm.
  10. Refactor these wrappers to simplify and make them "lightweight". Ideally, we want to use one common mechanism to load/save HDF5 data across the Mantid codebase, and this mechanism might be a very lightweight wrapper around the HDF5 library

After this work is done we are in a position to investigate what functionality we want to expose/add to improve performance of reading/writing hdf5-based nexus files.

Things still to be decided:

  • Where do the various FileDescripttor, NexusFileDescriptor, and NexusHDF5FileDescriptor go and should they have a hierarchy?
  • Do we still want to have objects know how to serialize/deserialize themselves to nexus? If so, should we keep the nexus subpackage unaware of mantid objects (e.g. Kernel, API)
@robertapplin robertapplin added Maintenance Unassigned issues to be addressed in the next maintenance period. Epic labels Oct 31, 2024
@peterfpeterson
Copy link
Member

As of the time of this writing, the dependencies of the current nexus conda package are (using conda-tree)

$ conda tree deptree
...
  ├─ nexus 4.4.3 [required: 4.4.*]
  │  ├─ hdf4 4.2.15 [required: >=4.2.15,<4.2.16.0a0]
  │  │  ├─ libgcc-ng 14.1.0 [required: >=12]
  │  │  │  └─ dependencies of libgcc-ng displayed above
  │  │  ├─ libjpeg-turbo 3.0.0 [required: >=3.0.0,<4.0a0]
  │  │  │  └─ dependencies of libjpeg-turbo displayed above
  │  │  ├─ libstdcxx-ng 14.1.0 [required: >=12]
  │  │  │  └─ dependencies of libstdcxx-ng displayed above
  │  │  └─ libzlib 1.3.1 [required: >=1.2.13,<2.0.0a0]
  │  │     └─ dependencies of libzlib displayed above
  │  ├─ hdf5 1.14.3 [required: >=1.14.3,<1.14.4.0a0]
  │  │  └─ dependencies of hdf5 displayed above
  │  ├─ libgcc-ng 14.1.0 [required: >=12]
  │  │  └─ dependencies of libgcc-ng displayed above
  │  ├─ libjpeg-turbo 3.0.0 [required: >=3.0.0,<4.0a0]
  │  │  └─ dependencies of libjpeg-turbo displayed above
  │  ├─ libstdcxx-ng 14.1.0 [required: >=12]
  │  │  └─ dependencies of libstdcxx-ng displayed above
  │  ├─ libxml2 2.12.7 [required: >=2.12.6,<3.0a0]
  │  │  └─ dependencies of libxml2 displayed above
  │  ├─ libzlib 1.3.1 [required: >=1.2.13,<2.0.0a0]
  │  │  └─ dependencies of libzlib displayed above
  │  ├─ ncurses 6.5 [required: >=6.4.20240210,<7.0a0]
  │  │  └─ dependencies of ncurses displayed above
  │  └─ readline 8.2 [required: >=8.2,<9.0a0]
  │     └─ dependencies of readline displayed above

@martyngigg
Copy link
Member

6. Move code from napi5.c into napi.c to remove a layer of abstraction. This may be better to combine with the next step.

7. Move the C code of napi.c into NeXus::File.cpp

8. Review the old wrappers such as `NexusFileIO` and `NexusIOHelper` to understand how they are used, and where they are used

9. Refactor these wrappers to simplify and make them "lightweight". Ideally, we want to use one common mechanism to load/save HDF5 data across the Mantid codebase, and this mechanism might be a very lightweight wrapper around the HDF5 library

I think this looks good and is a good summary of our discussion 😸. I'd probably say with steps 6-7 might be better done after 8 and the review step encompasses what an ideal API for the new internal NeXusFile class would look like and how it interacts with things like NexusIOHelper rather than just picking up the neXus API code wholesale and putting it inside.

I also would say that I think NexusFileIO should be deleted as it's pretty ancient and whatever the new NeXus::File/NexusIOHelpers look like can probably subsume all of that behaviour in a more modern way.

@peterfpeterson
Copy link
Member

@peterfpeterson
Copy link
Member

Consider not setting the filecache size automatically during file opening. This was suggested by @ekapadi.

@peterfpeterson peterfpeterson moved this to Release 6.12 in Mantid project roadmap Nov 14, 2024
peterfpeterson added a commit to peterfpeterson/mantid that referenced this issue Nov 27, 2024
Refs mantidproject#38332

The dependencies of https://github.com/conda-forge/nexus-feedstock/blob/main/recipe/meta.yaml
* hdf4 - added
* hdf5 - already present
* libjpeg-turbo - required by hdf4
* zlib - added
* libxml2 - required by libxslt
* ncurses - required by readline
* readline - required by python
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Epic Maintenance Unassigned issues to be addressed in the next maintenance period.
Projects
Status: Release 6.12
Development

No branches or pull requests

3 participants