-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
as a patron I want the pdf endpoint to extract all urls from Global Connectivity Report so I can check their status #844
Labels
Comments
Would it be useful to have a debug=true parameter that dumps all the text and annotations? |
if that is the best way to dump the text, then yes! |
possible solution #852 |
in this edge case it would work to not remove the linebreaks and instead remove all spaces |
dpriskorn
moved this from New
to Save for future sprint
in Internet Archive Reference Inventory
Jun 7, 2023
dpriskorn
changed the title
PDF Link Parsing Error: Only 1 Link found, where there should be hundreds
as a patron I want the pdf endpoint to extract all urls from Global Connectivity Report so I can check their status
Jun 7, 2023
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
IARE:
https://internetarchive.github.io/iare/?url=https://www.itu.int/dms_pub/itu-d/opb/ind/d-ind-global.01-2022-pdf-e.pdf
produces only 1 URL link.
There are hundreds in the document, as you can see by looking at the document directly:
https://www.itu.int/dms_pub/itu-d/opb/ind/d-ind-global.01-2022-pdf-e.pdf
The text was updated successfully, but these errors were encountered: