-
-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Re-work image resizing #2101
base: main
Are you sure you want to change the base?
Re-work image resizing #2101
Conversation
… of 320px, assuming it can use URL hacking to generate the proper URL to download
With this change, we see the following:
Scraping of BM with these settings results in a ZIM that is about 2 MB, or 15% bigger. |
With image max size set to 280px, we save another half megabyte. Compared to the 1.13 ZIM, the size increases by about 11%.
|
By the way, I spent way too long looking through the repo here and even posted a phabricator ticket, but my intuition is that they were just setting max sizes and we should do the same. |
Just checking, the current max appears to be 264px (looking at full English Wikipedia). I quite like the larger 320px, and it is something users have been requesting... I suppose this needs weighing up carefully. 15% on ~100GB doesn't seem like too high a price to pay to me... |
How did you find that value? Just by finding some 264 width images and assuming there's nothing bigger? |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2101 +/- ##
=======================================
Coverage ? 74.71%
=======================================
Files ? 41
Lines ? 3188
Branches ? 703
=======================================
Hits ? 2382
Misses ? 686
Partials ? 120 ☔ View full report in Codecov by Sentry. 🚨 Try these New Features:
|
Oh sorry, I did that a bit fast, and having checked, I've found some at 280px down the right-hand side, so no, no guarantee that that's the max, more an average! There are some much wider images centred in some pages and we'll have to be careful not to clobber those. For example here in the "Paris" article: |
Okay, when investigating the Paris panorama, I found a bug, which I fixed and added a test for above. Now the sizes look like this:
That's a 1.8% increase for 280px width images, and a 7.65% increase for 320px width images. These numbers have gotten smaller because we're catching more images and squishing them to our max size. |
And of course, as you predicted @Jaifroid, we get this: Looks like we need to add logic to only scale in the dominant dimension. |
… image in dominant dimension (width or height)
Okay, added logic for scaling the smallest dimension to 320px, while keeping the aspect ratio. This, unsurprisingly, gives us the biggest ZIM yet (
And |
Re-work image sizing algorithm. It now enforces a maximum image width of 320px.
This value is chosen because it is a reasonable size on both mobile and desktop, but saves bandwidth compared to "full size" images (which was what the code might have been grabbing from
data-data-original-file-src
before). This should give us a nice tradeoff between efficiency and quality. We can always adjust this value later, of course, but the logic for retrieving and resizing the image URLs remains the same.Of note, it takes the smallest of:
src
attribute set on the<span>
itself that will become the imagedata-data-original-file-src
attribute, of the original size the image was in the articleTests have been added/adjusted.
EDIT:
Additionally, based on feedback in this thread, I have modified the algorithm to refrain from scaling any dimension smaller than 320px. So for large panorama images such as in the enwiki Paris article, the image does not get destroyed because it is so wide (because scaling to a width of 320px leaves a height of 50px)