-
-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ZIM Deletion Policy #103
Comments
I will try a proposition of deletion policy, please provide feedback quickly, I would like to enforce it by the end of the month of May 2024 at the latest. WDYT? Deletion policyOn an exceptional basis, it is possible to delete a ZIM which has been published on library.kiwix.org ; we need to ensure this is kept exceptional. As a publisher we somehow promise users to make our best effort so they keep access to the content we've published once. And as a publisher we need to enforce Q/A so that published ZIMs are known to be OK before publication. Scope details:
The only acceptable reasons to delete a ZIM are:
Except the last one, all these reasons are not expected to happen on a regular basis, or even never happened in the past, so we expect they will continue to lead to a very low level of ZIM deletions. Following reasons are not acceptable:
|
@RavanJAltaie FYI, suggestions are welcomed |
Zimfarm issue about metadata update is here: openzim/zimfarm#956 |
LGTM ; @Popolechien do you remember the I think this would match “ZIM content is now known to be wrong” but we'd still have to discuss case-by-case whether it's worth deleting (as we know vandalized articles are most likely included in every ZIM). |
Yeah I don't think this particular case fit in the reasons listed, but then this seems fairly common sense. Maybe add something along the lines of "Zim content deviates significantly from educational mission". There's another zim that has been flagged recently as moving away from prepper content/thematics to simple product placement: I still need to look into it but to me that would also warrant removal. Other than that, I would remove this sentence from the intro:
Not sure about the |
I don't agree, a policy is meant to avoid relying on common sense since it is clear that this is to much a topic of interpretation. I would add a reason like "ZIM contains vandalized / defaced content on important pages". I'm a bit afraid this is still a bit too subjective, but the past showed us that we made the decision to delete the ZIM for one single vandalized page, so it seems this is the path we want to follow.
I would make it even broader with "ZIM content does not match acceptable content policy (educational mission, ...)"
I don't mind to remove the "somehow". But still I don't think this phrase makes us an archival project, and I consider it is very important. Most content providers have the same kind of core promise. For instance, StackExchange gets contributions because they promise users will continue to get access to the published content for "the time being". StackExchange has a strong policy on which questions might get deleted at https://meta.stackexchange.com/help/deleted-questions (and they do delete a lot AFAIK). Without both, I'm quite sure the project would fade out quickly. If we remove this sentence, then I don't get why we would really need a deletion policy and what could help us decide what is acceptable or not in this policy. I would consider we might delete any ZIM which is not suiting any of us anymore, whatever the reason, since it is clearly the least effort path and our available bandwidth is very limited anyway. To help me better understand, I would probably benefit from another "core promise" which explains why the deletions I've listed as not acceptable are indeed not acceptable. Otherwise it looks to me this will always be the topic of debates. That being said, if at least we are all aligned today on the acceptable reasons, I don't mind we remove the phrase if it is not ok for a majority (I don't like consensus ^^) |
Very important clarification: we did not remove that content from the Catalog. We removed one ZIM file because we keep two specifically for such reasons. If the latest one out of the Zimfarm has an issue, we can delete it and continue to serve the content (we only serve one version of a Title at once). Also, that content is being refreshed periodically (but recreating is fragile and takes time). I think in my mind the policy was for for removing content and not individual ZIMs when there's another one but it's probably the place to clarify both situation |
It is named "ZIM deletion policy", so I thought we wanna deal with individuals ZIMs. This is intentional from my side, and the reason why I clearly mentioned these "two more recent versions". And probably the right granularity for such a policy since anyway deletion requests are usually done at the ZIM level (not content). |
We will never be able to cover every possible way things can go wrong, unless the policy goes into so much detail that it becomes irrelevant. There will always be some level of arbitrary decision. For the case referred to of a specific Zim file with problematic content, the informal policy we had with @RavanJAltaie is "Do people complain, which means that people notice?". That allows us to identify high-traffic, high visibility zim files/pages that need immediate action (whereas low-traffic ones can be automatically handled by the next scraper iteration. A choice has to be made between "Delete old zim files, with exceptions" and "Do not delete zim files, with exceptions". Finding a wording that intersects both would be ideal. |
As agreed during the Hackathon, here's a request to the Content Team for a ZIM Deletion Policy. We want a Wiki entry that lists all the possible reasons for deleting a ZIM. Requests for deletion will then need to provide said reason.
We've discussed that one of the reason could be Metadata not aligned anymore with our Q/A standard. We want to ultimately allow content team to fix that themselves (@benoit74 to open a ticket on zimfarm). In the mean time, individual delete-requests can exceptionally be fixed by developers using zimrecreate.
The text was updated successfully, but these errors were encountered: