-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
upgrade to proj 8.2 #97
Conversation
09be782
to
52d5cb6
Compare
Just to make sure I'm understanding this:
And you're going to try to eliminate (1). That's quite a surprising source! |
Ha, your instincts are better than mine. I'm pretty sure that's not the issue after poking at it a bit. |
My kingdom for even a wild theory as to how this could be happening... The "bundled_proj" script builds from: https://github.com/georust/proj/pull/97/files#diff-27d669a056374d86fef7e0627a9011d3ef7388e4c16e67a981b1bfb6cc83413cR84 And that seems to be working all around. 👍 The containers which are not using bundled_proj are using the system proj. But it's not installed from apt or anything. It's built from this (seemingly identical) source code: built here: and copied here: How would this be different? |
(Again, just so I'm clear here) the staged build copies the binary? Where on earth could a difference be coming from? |
That's right. In a little more detail: The libproj-builder container compiles libproj (from the 8.2.0 source), but doesn't install it. Containers which want libproj pre-installed (like proj-ci and geo-ci) copy it from the libproj-builder container into their own system path. I can't figure out what the difference would be in the behavior of the proj compiled by the bundled_proj flag vs copied from the libproj-builder container. |
even wilder: So when running all tests, three fail.
But when I run the failed tests indivudally:
passes!
passes!
passes! |
Don't feel like you have to spend time on this, I'll keep plugging away, just documenting my findings as I go. One more wild one... when running these two tests repeatedly, sometimes one fails, sometimes the other. It seems like there is some weird external effect at play, like some kind of race condition.
Also maybe/maybe not interesting, when I specify test-threads=1, test still fail, but they seem to follow a slightly different pattern. vague theories:
There's also the risk of some kind of memory corruption thing, but if that were the case, I might expect the incorrect values to be less close and/or less consistent. |
Oh no. What happens when you run two of them together, or add another conversion test? Because that sounds like memory might be getting clobbered by an FFI error somewhere… |
The network grid seems less likely, if only because we throw errors if it doesn't work. I guess an easy way to confirm that is to add a test that doesn't rely on the network and see if it fails in the same way? |
Re FFI: the issue causing #76 has been in the back of my mind. |
Ugh sorry, I was thinking of #75, though they are related. |
Continuing to noodle on this... Just as a sanity check...
This is an invalid coordinate, right?
Since it's a geodetic coordinate, shouldn't y always be less than 0.5 radians? |
I don't think so…you can readily convert it to degrees, which is a valid point in Romania (which is expected, given the Stereo70 projection) |
Ugh, of course. Sorry just confusing myself with really basic stuff. On the plus side, I seem to have gotten tests passing locally with fb89caf I'm not 100% sure on exactly what's happening, but I'm continuing to investigate. Let's see what CI says... |
src/proj.rs
Outdated
@@ -632,21 +632,30 @@ impl Proj { | |||
}; | |||
let c_x: c_double = point.x().to_f64().ok_or(ProjError::FloatConversion)?; | |||
let c_y: c_double = point.y().to_f64().ok_or(ProjError::FloatConversion)?; | |||
let coord = if inverse { | |||
// Converting from degrees |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
whoops, got my comment backwards. 🙃
Ok, it seems to be passing CI now. Honestly, I think the change is somewhat "intuitive", in that if we're passing in degrees to project we use PJ_LP, and if we're doing an inverse projection, then we'd use PJ_XY to denote a projected Point (e.g. in meters). But I confess I have no idea why this fixes anything! It's just a regular ole dumb C-union, so the representation of the PJ_XY vs PJ_LP variants of PJ_COORD should be identical. ... right? |
(disclaimer: I'm not well versed on c memory layout) |
🟢! |
That's also my understanding – when I originally implemented it it looked like a simple union so I just shrugged and moved on, but I didn't look at the proj source to see whether it was doing anything different based on whether it was PJ_XY or PJ_LP… |
Only intuition after reading PJ_COORD and its various flavors. My read was that PJ_LP was for angular coords and PJ_XY was for projected coords - it seemed like we were using PJ_LP always. But as I said above, I don't actually understand why this fixes anything, because my understanding is that their memory layout should be identical. I'm going to read some proj source and try to learn something more about c unions... |
But as I understand it, it's equally both of those things. A basic union (as opposed to a tagged union) has no notion of what variant it is it just relies on the programmer to handle that. |
I am out over my skis if that isn't the case, but given that it's just fixed what we suspected was a threading-related bug… |
Just by way of an update: The tests are still occasionally failing for me, but seemingly much less with the changes around LP vs. XY. (I intermittently get failing results) I feel pretty confident that this change is inducing different behavior, but not in any way that makes sense. I assume there is still some yet to be understood bug (presumably in our code) and that this change has just jostled the machine code around in such a way that we're avoiding the bug more often. I'll keep looking into this, but maybe not for a couple days. |
I wonder if maybe it's static vs. dynamic linking that's causing the bug to present vs. not. That could explain why It doesn't help fix the bug, but at least it's a theory for how the two scenarios could be behaving differently. |
bors try |
tryBuild failed: |
ebef601
to
eea0065
Compare
eea0065
to
bed17eb
Compare
superseded by #118 |
Just a draft for now as I work through some outstanding issues...
I've made the update, and rebuilt containers, but I'm still seeing some mysterious failures...
For some reason using the pre-built libproj seems to be causing some smallish discrepencies on older containers. TBH I suspect the base system rather than the actual rust version.
next step is to try to see if just doing an
apt upgrade
fixes anything...https://github.com/georust/proj/actions/runs/1580354293