-
-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cache the titles, descriptions and subtitles #200
Comments
How is that a problem? How measurable is that? I'm not in favour with upstream synchronisation based on time delays.... ETAG based solutions should be used. |
As mentioned, first-order problem is that it is time-consuming to fetch (especially when the video has been translated into 10s of languages). I don't have measure to share yet still we are now reencoding all the videos, so reencoding is the main share of task duration. But once reencoding will be complete, most task will just download videos from the cache. I will share them once available. ETAGs are indeed available, not sure how well they work but should be ok, see https://www.ted.com/talks/oral_mcguire_how_to_live_with_fire?delay=5s&subtitle=en&trigger=30s |
For instance on https://farm.openzim.org/pipeline/3241d2f3-c4d9-489d-98dc-67820f39e6c0/debug, these are the stats (all images and reencoded videos are already in S3 cache): Download video infos from TED website: 16 mins So we spend more time downloading info from TED than downloading videos from cache. |
(in mentioned task we finally had 23 videos to ZIM) |
Videos titles, descriptions and subtitles are not yet cached on S3.
They are however not expected to change much and are rather time-consuming to fetch (especially when the video has been translated into 10s of languages)
Titles and descriptions requires to fetch the HTML page of the video for every language and parse it with Bettersoup to extract this.
Subtitles have to be converted to proper format.
We should cache them and only refresh them when someone complains or one in a while, especially if we continue to want to update the ZIM on a very regular basis to fetch the few new videos that have been published.
The text was updated successfully, but these errors were encountered: