Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Symbiota unable to access Arctos images #6088

Open
Jegelewicz opened this issue Apr 3, 2023 · 20 comments
Open

Symbiota unable to access Arctos images #6088

Jegelewicz opened this issue Apr 3, 2023 · 20 comments
Labels
Aggregator issues e.g., GBIF, iDigBio, etc Function-Media Grant funded (Arctos member) This issue is related to an Arctos member grnat funded project.

Comments

@Jegelewicz
Copy link
Member

Hi Teresa,

We've run into a snag with creating thumbnail images from Arctos-provided images for the University of Alaska Museum (ALA), I think because when you first try to navigate to an image (e.g., http://arctos.database.museum/media/10348285?open), you're taken to this welcome screen:
image.png

Obviously, our processing scripts are having trouble understanding the terms and conditions 😉 Do you have any suggestions for how we might be able to get around this? Does Arctos have thumbnail URLs it could send instead?

Best,

Katie D. Pearson

@Jegelewicz Jegelewicz added Function-Media Aggregator issues e.g., GBIF, iDigBio, etc Priority - Wildfire Potential ignore this at everyone's peril, may smolder for now ... labels Apr 3, 2023
@Jegelewicz
Copy link
Member Author

Katie,

This is a known issue that is also affecting images at GBIF. We are trying to balance security (no bots) with an ability to monitor usage. I've created an issue in our Github repo so that we can figure out the best way to handle this.

Thanks!

Teresa J. Mayfield-Meyer

@Jegelewicz
Copy link
Member Author

Perhaps we should consider only sending thumbnail links for the aggregators?

@Jegelewicz
Copy link
Member Author

See also gbif/model-material#108 I think this is related?

@campmlc
Copy link

campmlc commented Apr 3, 2023

This is also NSF funding related - we need to have our ecto images available as links through the TPT/SCAN portal, before July 31.

@Jegelewicz Jegelewicz added the Grant funded (Arctos member) This issue is related to an Arctos member grnat funded project. label Apr 3, 2023
@Jegelewicz
Copy link
Member Author

@campmlc this is a community issue and we need to get it settled so we can just proceed.

We can provide direct links to the images at TACC, but that will NOT allow you to know how many people click on them from Symbiota.

We can provide direct links to the thumbnails at TACC - same issue as above, but it's "just" the thumbnails. I don't know how people would know to get the "better" images through Arctos (which allows you to track clicks)

We can do what we are doing now - which isn't working for the aggregators (but maybe it should).

We can ask the aggregators to provide click-through information. I am not sure if this is feasible, but it seems like something they could do?

In any case, we are attempting to balance a lot of issues with very few resources and we need to decide what is most important.

@Jegelewicz Jegelewicz added this to the Next AWG Meeting milestone Apr 3, 2023
@campmlc
Copy link

campmlc commented Apr 3, 2023 via email

@Jegelewicz
Copy link
Member Author

Katie,
Do you guys offer the ability to track click through to media? Our collections would like to be able to report media usage. If we pass the direct URIs for media, we lose that ability, but if we could get reports from Symbiota on media click-through, that would be at least part of the puzzle (the rest would be downloads - but I don't see how you guys could monitor that).
Adios,

Teresa J. Mayfield-Meyer

@dustymc
Copy link
Contributor

dustymc commented Apr 3, 2023

funding related

That really needs to at least involve some communication!

have our ecto images available

If they're in Arctos they're "available."

you guys

I don't think that can be a solution - we can't control who does what with the DWC data.

https://github.com/ArctosDB/internal/issues/240 is central to this.

@dustymc
Copy link
Contributor

dustymc commented Apr 4, 2023

Does Arctos have thumbnail URLs it could send instead?

I think we do?? I don't actually know where it lands and can't find anything useful in eg http://ipt.vertnet.org:8080/ipt/resource?r=almnh_inv but Arctos has been providing auduboncore data since forever.

known issue that is also affecting images at GBIF.

I don't think this is true and I'm not sure where it came from.

We can do what we are doing now

And changing that is going to require some sort of nod from the collections - IDK why this is on the AWG Agenda, AWG doesn't own any Media????

balance a lot of issues with very few resources

Yes - and, I don't actually know, but I suspect eg ALA would prefer NOT to have their images replicated in a million places, this really needs to involve the collections (perhaps at the MOU level??).

@Jegelewicz
Copy link
Member Author

IDK why this is on the AWG Agenda, AWG doesn't own any Media????

Because the collections ARE the AWG.

@dustymc
Copy link
Contributor

dustymc commented Apr 4, 2023

collections ARE the AWG.

Pretty sure I haven't entirely imagined a bunch of the recent issues...

@campmlc
Copy link

campmlc commented Apr 4, 2023

So where are we at with this? I just spoke with @mkoo earlier today. Perhaps we need to set up a meeting - she has a Symbiota programmer at MVZ.

@Jegelewicz
Copy link
Member Author

We need to understand what the collections want us to do here, which is why I put it on the AWG agenda, but I guess @dustymc has veto powers over what gets discussed in AWG?

We cannot just make a blanket decision for this and if we do not bring it to the AWG, there will never be a resolution.

@campmlc
Copy link

campmlc commented Apr 4, 2023 via email

@dustymc
Copy link
Contributor

dustymc commented Apr 5, 2023

I don't really want veto powers on the agenda, but I'd also like to stick to discussions which might lead to some sort of conclusion if at all possible. Unless there's been some change to the MOUs or similar, then I do not think the AWG is interchangeable with nor can speak for collections, and so I do not think this can be a productive discussion. The initial issue involves @StefanieBond and as far as I'm aware Steffi is the only person who can speak for Steffi. I don't know what the proper platform for that is, maybe this is all just a reflection of my ignorance, but I don't think an AWG meeting is the right venue.

This does not potentially involve multiple collections: It DOES involve every collection that uses Media (unless maybe I'm not also understanding something about some proposed solution?).

@Jegelewicz
Copy link
Member Author

It DOES involve every collection that uses Media

Sorta - it DOES involve every collection that uses media AND publishes to the data aggregators.

@Jegelewicz
Copy link
Member Author

And we could address this in the media policy.

@dustymc
Copy link
Contributor

dustymc commented Apr 5, 2023

we could address this in the media policy.

I think that's not quite right, it doesn't include the single-uploader thing and maybe whatever we'll build next, but yes, agreed, I think this requires policy (and I think we can't just unilaterally change that, which is why I don't think the AWG can solve this).

The amendment would be simply 'Arctos can't track Media clicks' (and so those who need that would need to arrange something with TACC-or-whomever) - but I wonder if #4343 would change that landscape in some way?

AND publishes to the data aggregators

I would also strongly prefer (as in, I don't think sustainability allows anything else) that to be policy as well - https://github.com/ArctosDB/internal/issues/260

@camwebb
Copy link

camwebb commented Apr 5, 2023

This comment bypasses the complex issue of logging media use. But in reply to the original Q from Katie Pearson, she could simply add this at the beginning of her thumbnail script:

curl -c jar -H 'Accept: application/json' --data 'method=agree_terms' --data 'returnformat=json'  \
   'https://arctos.database.museum/component/functions.cfc'

then repeated calls for media will work:

curl -b jar -L -o 10003181.jpg 'https://arctos.database.museum/media/10003181?open'
curl -b jar -L -o 10003182.jpg 'https://arctos.database.museum/media/10003182?open'

😁

@dustymc
Copy link
Contributor

dustymc commented Apr 5, 2023

repeated calls for media will work:

We do not have the resources to support scraping and anything which reaches a point that TACC or I cannot ignore will result in network blocks.

@dustymc dustymc removed the Priority - Wildfire Potential ignore this at everyone's peril, may smolder for now ... label Aug 23, 2023
@dustymc dustymc modified the milestones: Needs Discussion, DWC Apr 9, 2024
@dustymc dustymc modified the milestones: DWC, Community Forum Sep 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Aggregator issues e.g., GBIF, iDigBio, etc Function-Media Grant funded (Arctos member) This issue is related to an Arctos member grnat funded project.
Projects
None yet
Development

No branches or pull requests

4 participants