Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Library generation fails with Dirent pointer table outside (or not fully inside) ZIM file #283

Open
benoit74 opened this issue Oct 14, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@benoit74
Copy link
Collaborator

Installed /usr/local/bin/library-maint from https://raw.githubusercontent.com/kiwix/k8s/main/zim/library-mgmt/library-maint.py
starting…
2024-10-13 05:00:32,347 INFO Starting library-maint for read, write-libraries
2024-10-13 05:00:32,347 INFO [READ] Loading previous Public Library
2024-10-13 05:00:32,347 WARNING [READ] Unbale to read previous library. Purging disabled.
2024-10-13 05:00:32,437 DEBUG [READ] 0000 dev/Bundesministerium_fuer_Inneres_2024-07.zim
2024-10-13 05:00:32,439 DEBUG [READ] 0001 dev/africanstorybook.org_mul_all_newui_2023-10.zim
2024-10-13 05:00:32,443 DEBUG [READ] 0002 dev/alexandria.dk_en_all_2024-10.zim
2024-10-13 05:00:32,445 DEBUG [READ] 0003 dev/ancient.eu_en_all_2024-08.zim
2024-10-13 05:00:32,451 DEBUG [READ] 0004 dev/api.plos.org_en_all_2024-08.zim
2024-10-13 05:00:32,453 DEBUG [READ] 0005 dev/ashevillerelief.com_en_all_2024-10.zim
2024-10-13 05:00:32,455 DEBUG [READ] 0006 dev/avanti-3dimensional-geometry_2024-10.zim
2024-10-13 05:00:32,537 DEBUG [READ] 0007 dev/banrepcultural.org_es_enciclopedia_2024-09.zim
2024-10-13 05:00:32,539 DEBUG [READ] 0008 dev/benyehuda.org_he_all_2024-10.zim
2024-10-13 05:00:32,540 ERROR FAILED. An error occurred: Dirent pointer table outside (or not fully inside) ZIM file.
2024-10-13 05:00:32,540 ERROR Dirent pointer table outside (or not fully inside) ZIM file.
Traceback (most recent call last):
  File "/usr/local/bin/library-maint", line 948, in entrypoint
    sys.exit(maint.run())
  File "/usr/local/bin/library-maint", line 747, in run
    self.readfs()
  File "/usr/local/bin/library-maint", line 514, in readfs
    entry = self.read_zimfile_info(
  File "/usr/local/bin/library-maint", line 458, in read_zimfile_info
    zim = Archive(fpath)
  File "libzim/libzim.pyx", line 717, in libzim.Archive.__cinit__
RuntimeError: Dirent pointer table outside (or not fully inside) ZIM file.
Stream closed EOF for zim/dev-library-generator-28813260-vhsbr (debian)

This happened twice in a row, for the new dev/benyehuda.org_he_all_2024-10.zim ZIM. It then worked successfully. I'm quite sure it means that the library generation job is not nicely handling the case where the file is not yet fully uploaded while running. This is is pretty big (22G). It is however a bit surprising we never encountered this situation before. Something probably changed quite recently ...

@benoit74 benoit74 added the bug Something isn't working label Oct 14, 2024
@rgaudin
Copy link
Member

rgaudin commented Oct 14, 2024

We have encountered this in the past (started in 2022).

As you guessed, the problem happens when the libzim tries to read a ZIM that is being transferred on the FS.
Given this is the libzim crashing on a ZIM, I think it's wise to keep it as an Error and crashing the refresh. We have a clear event and log and this is self-recovered in a future job.

What we should do though is what I suggested in that initial comment: move the file with a temp name to its final folder (mount point) and only then rename to .zim.

To me this is getting more frequent because we are creating more large ZIMs and the library generation is faster and thus running a lot more than it used to.

@benoit74 benoit74 mentioned this issue Oct 14, 2024
21 tasks
@benoit74 benoit74 mentioned this issue Oct 28, 2024
21 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants