-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Collection crawling seems to be broken #79
Comments
I'm relatively sure that I did not expand the "All collections" section before taking the screenshot yesterday, so it's interesting to note that meanwhile the behaviour is that initially, 434 collections are listed and only after expanding the section that number is replaced with the 921. That's another sign that this is probably due to inconsistent database content (see also #78 (comment)) |
Okay, the problem is the primary key of the collections table: It's set on This was done to minimise changes (see also #56), because the service URL most likely never changes, and I thought the same of the But the grouping of all raw documents into the individual backend entries is done on the And because one unsuccessful endpoint* is enough to deem the whole backend unsuccessful, GEE was flagged as such. Line 15 in 6c64fdf
* of any of the endpoints Questions arising from this:
For 2. I'd say yes, it kinda would've prevented this bug (when the For 3. I'd say no, but how crawling errors are communicated to the user should be discussed anyway, which is why #23 exists. |
Thanks for investigating.
That can happen regularly (like every x months or so)
I don't fully understand that yet. Can you use the https://earthengine.openeo.org/v1.0 URL?
Fine with "no". |
Assume a backend changes its Now the difference is: Grouping on -> 1 backend Grouping on -> 2 backends |
Grouping on backend seems correct, but I guess the question is why crawling it doesn't drop old collections? It sounds like that's the original issue that on crawling the old data doesn't get removed or correctly updated, right? |
That's right, and the cause was the same: Old data was removed based on the I tested the crawling several times and it worked both for the GEE case and also for EODC -- they changed to |
I'm confident this works fine, so I merged it onto |
It seems fixed. I restarted the server this morning and couldn't reproduce any longer (although the server is on the dev branch, I think). |
Right now, dev and master are identical. As long as you only pull when I tell you to do so you can leave it on dev :D |
@m-mohr informed me that there's something wrong with how the Earth Engine driver is listed in the Hub:
The backend is reported as being unavailable, but it doubtlessly is available.
This bug seems to be due to the collection crawling: More than 900 collections are being reported, but GEE actually has "only" around 480, of which quite a few are rather new, so probably both the old and the new ones (~440 + ~480 = ~920) are floating around the database and causing errors...
The text was updated successfully, but these errors were encountered: