Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

upgrade to proj 8.2 #97

Closed
wants to merge 2 commits into from
Closed

upgrade to proj 8.2 #97

wants to merge 2 commits into from

Conversation

michaelkirk
Copy link
Member

@michaelkirk michaelkirk commented Dec 14, 2021

Just a draft for now as I work through some outstanding issues...

I've made the update, and rebuilt containers, but I'm still seeing some mysterious failures...

  • ✅ proj-sys on rust-1.57
  • ✅ proj-sys on rust-1.52
  • ✅ proj on rust-1.57
    • ✅ proj w/ --features=bundle
    • ✅ proj using prebuilt proj
  • ⁉️ proj on rust-1.52
    • ✅ proj w/ --features=bundle
    • 🔥 proj using prebuilt proj

For some reason using the pre-built libproj seems to be causing some smallish discrepencies on older containers. TBH I suspect the base system rather than the actual rust version.

next step is to try to see if just doing an apt upgrade fixes anything...

https://github.com/georust/proj/actions/runs/1580354293

failures:
---- proj::test::test_inverse_projection stdout ----
thread 'proj::test::test_inverse_projection' panicked at 'assert_relative_eq!(t.x(), 0.43633200013698786)

    left  = 0.43636512759931073
    right = 0.43633200013698786

', src/proj.rs:1069:9
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

---- proj::test::test_london_inverse stdout ----
thread 'proj::test::test_london_inverse' panicked at 'assert_relative_eq!(t.x(), 0.0023755864830313977)

    left  = 0.002407322568220306
    right = 0.0023755864830313977
', src/proj.rs:1086:9

---- proj::test::test_projection stdout ----
thread 'proj::test::test_projection' panicked at 'assert_relative_eq!(t.x(), 500119.7035366755, epsilon = 1e-5)

    left  = 499972.70746607287
    right = 500119.7035366755

', src/proj.rs:1054:9


failures:
    proj::test::test_inverse_projection
    proj::test::test_london_inverse
    proj::test::test_projection

@urschrei
Copy link
Member

Just to make sure I'm understanding this:
There are currently two possible sources of difference:

  1. Container libraries
  2. Rust version

And you're going to try to eliminate (1). That's quite a surprising source!

@michaelkirk
Copy link
Member Author

Ha, your instincts are better than mine. I'm pretty sure that's not the issue after poking at it a bit.

@michaelkirk
Copy link
Member Author

My kingdom for even a wild theory as to how this could be happening...

The "bundled_proj" script builds from: https://github.com/georust/proj/pull/97/files#diff-27d669a056374d86fef7e0627a9011d3ef7388e4c16e67a981b1bfb6cc83413cR84

And that seems to be working all around. 👍

The containers which are not using bundled_proj are using the system proj. But it's not installed from apt or anything. It's built from this (seemingly identical) source code:

built here:
https://github.com/georust/docker-images/blob/mkirk/update-proj/rust-1.52/libproj-builder.Dockerfile

and copied here:
https://github.com/georust/docker-images/blob/mkirk/update-proj/rust-1.52/proj-ci.Dockerfile#L10

How would this be different?

@urschrei
Copy link
Member

(Again, just so I'm clear here) the staged build copies the binary? Where on earth could a difference be coming from?

@michaelkirk
Copy link
Member Author

(Again, just so I'm clear here) the staged build copies the binary? Where on earth could a difference be coming from?

That's right. In a little more detail:

The libproj-builder container compiles libproj (from the 8.2.0 source), but doesn't install it.

Containers which want libproj pre-installed (like proj-ci and geo-ci) copy it from the libproj-builder container into their own system path.

I can't figure out what the difference would be in the behavior of the proj compiled by the bundled_proj flag vs copied from the libproj-builder container.

@michaelkirk
Copy link
Member Author

even wilder:

So when running all tests, three fail.

rust-1.52> cargo test
failures:
    proj::test::test_inverse_projection
    proj::test::test_london_inverse
    proj::test::test_projection

But when I run the failed tests indivudally:

rust-1.52> cargo test proj::test::test_inverse_projection
running 1 test
test proj::test::test_inverse_projection ... ok

passes!

rust-1.52> cargo test proj::test::test_london_inverse
running 1 test
test proj::test::test_london_inverse ... ok

passes!

rust-1.52> cargo test test_projection
test proj::test::test_projection ... ok

passes!

@michaelkirk
Copy link
Member Author

Don't feel like you have to spend time on this, I'll keep plugging away, just documenting my findings as I go.

One more wild one... when running these two tests repeatedly, sometimes one fails, sometimes the other. It seems like there is some weird external effect at play, like some kind of race condition.

rust-1.52> cargo test projection
    Finished test [unoptimized + debuginfo] target(s) in 0.06s
     Running unittests (/tmp/cargo-target/debug/deps/proj-ec6d6ab38686495b)

running 2 tests
test proj::test::test_inverse_projection ... ok
test proj::test::test_projection ... FAILED

failures:

---- proj::test::test_projection stdout ----
thread 'proj::test::test_projection' panicked at 'assert_relative_eq!(t.x(), 500119.7035366755, epsilon = 1e-5)

    left  = 499972.70746607287
    right = 500119.7035366755

', src/proj.rs:1054:9
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace


failures:
    proj::test::test_projection

test result: FAILED. 1 passed; 1 failed; 0 ignored; 0 measured; 13 filtered out; finished in 0.01s
root@6009bf331ef0:/tmp/georust/proj# /usr/local/cargo/bin/cargo test projection
    Finished test [unoptimized + debuginfo] target(s) in 0.06s
     Running unittests (/tmp/cargo-target/debug/deps/proj-ec6d6ab38686495b)

running 2 tests
test proj::test::test_projection ... ok
test proj::test::test_inverse_projection ... FAILED

failures:

---- proj::test::test_inverse_projection stdout ----
thread 'proj::test::test_inverse_projection' panicked at 'assert_relative_eq!(t.x(), 0.43633200013698786)

    left  = 0.43636512759931073
    right = 0.43633200013698786

', src/proj.rs:1069:9
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace


failures:
    proj::test::test_inverse_projection

test result: FAILED. 1 passed; 1 failed; 0 ignored; 0 measured; 13 filtered out; finished in 0.01s

Also maybe/maybe not interesting, when I specify test-threads=1, test still fail, but they seem to follow a slightly different pattern.

vague theories:

  • some side effect caching bug?
  • maybe something in the network grid?

There's also the risk of some kind of memory corruption thing, but if that were the case, I might expect the incorrect values to be less close and/or less consistent.

@urschrei
Copy link
Member

Oh no. What happens when you run two of them together, or add another conversion test? Because that sounds like memory might be getting clobbered by an FFI error somewhere…

@urschrei
Copy link
Member

The network grid seems less likely, if only because we throw errors if it doesn't work. I guess an easy way to confirm that is to add a test that doesn't rely on the network and see if it fails in the same way?

@urschrei
Copy link
Member

Re FFI: the issue causing #76 has been in the back of my mind.

@michaelkirk
Copy link
Member Author

Re FFI: the issue causing #76 has been in the back of my mind.

Can you refresh my memory - what was the issue that lead to #76?

@urschrei
Copy link
Member

urschrei commented Dec 15, 2021

Ugh sorry, I was thinking of #75, though they are related.

@michaelkirk
Copy link
Member Author

Continuing to noodle on this...

Just as a sanity check...

    // Carry out a projection from geodetic coordinates
    fn test_projection() {
        let stereo70 = Proj::new(
            "+proj=sterea +lat_0=46 +lon_0=25 +k=0.99975 +x_0=500000 +y_0=500000
            +ellps=krass +towgs84=33.4,-146.6,-76.3,-0.359,-0.053,0.844,-0.84 +units=m +no_defs",
        )
        .unwrap();
        // Geodetic -> Pulkovo 1942(58) / Stereo70 (EPSG 3844)
        let t = stereo70
            .project(MyPoint::new(0.436332, 0.802851), false)
            .unwrap();
        assert_relative_eq!(t.x(), 500119.7035366755, epsilon = 1e-5);
        assert_relative_eq!(t.y(), 500027.77901023754, epsilon = 1e-5);
    }

This is an invalid coordinate, right?

MyPoint::new(0.436332, 0.802851)

Since it's a geodetic coordinate, shouldn't y always be less than 0.5 radians?

@urschrei
Copy link
Member

urschrei commented Dec 16, 2021

I don't think so…you can readily convert it to degrees, which is a valid point in Romania (which is expected, given the Stereo70 projection)

@michaelkirk
Copy link
Member Author

I don't think so…you can readily convert it to degrees, which is a valid point in Romania (which is expected, given the Stereo70 projection)

Ugh, of course. Sorry just confusing myself with really basic stuff.

On the plus side, I seem to have gotten tests passing locally with fb89caf

I'm not 100% sure on exactly what's happening, but I'm continuing to investigate.

Let's see what CI says...

src/proj.rs Outdated
@@ -632,21 +632,30 @@ impl Proj {
};
let c_x: c_double = point.x().to_f64().ok_or(ProjError::FloatConversion)?;
let c_y: c_double = point.y().to_f64().ok_or(ProjError::FloatConversion)?;
let coord = if inverse {
// Converting from degrees
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whoops, got my comment backwards. 🙃

@michaelkirk
Copy link
Member Author

Ok, it seems to be passing CI now.

Honestly, I think the change is somewhat "intuitive", in that if we're passing in degrees to project we use PJ_LP, and if we're doing an inverse projection, then we'd use PJ_XY to denote a projected Point (e.g. in meters).

But I confess I have no idea why this fixes anything! It's just a regular ole dumb C-union, so the representation of the PJ_XY vs PJ_LP variants of PJ_COORD should be identical.

... right?

@michaelkirk
Copy link
Member Author

(disclaimer: I'm not well versed on c memory layout)

@urschrei
Copy link
Member

🟢!
How did you decide to try that particular refactor?

@urschrei
Copy link
Member

Ok, it seems to be passing CI now.

Honestly, I think the change is somewhat "intuitive", in that if we're passing in degrees to project we use PJ_LP, and if we're doing an inverse projection, then we'd use PJ_XY to denote a projected Point (e.g. in meters).

But I confess I have no idea why this fixes anything! It's just a regular ole dumb C-union, so the representation of the PJ_XY vs PJ_LP variants of PJ_COORD should be identical.

... right?

That's also my understanding – when I originally implemented it it looked like a simple union so I just shrugged and moved on, but I didn't look at the proj source to see whether it was doing anything different based on whether it was PJ_XY or PJ_LP…

@michaelkirk
Copy link
Member Author

How did you decide to try that particular refactor?

Only intuition after reading PJ_COORD and its various flavors.

My read was that PJ_LP was for angular coords and PJ_XY was for projected coords - it seemed like we were using PJ_LP always.

But as I said above, I don't actually understand why this fixes anything, because my understanding is that their memory layout should be identical.

I'm going to read some proj source and try to learn something more about c unions...

@michaelkirk
Copy link
Member Author

on whether it was PJ_XY or PJ_LP…

But as I understand it, it's equally both of those things. A basic union (as opposed to a tagged union) has no notion of what variant it is it just relies on the programmer to handle that.

@urschrei
Copy link
Member

on whether it was PJ_XY or PJ_LP…

But as I understand it, it's equally both of those things. A basic union (as opposed to a tagged union) has no notion of what variant it is it just relies on the programmer to handle that.

I am out over my skis if that isn't the case, but given that it's just fixed what we suspected was a threading-related bug…

@michaelkirk
Copy link
Member Author

michaelkirk commented Dec 17, 2021

Just by way of an update:

The tests are still occasionally failing for me, but seemingly much less with the changes around LP vs. XY. (I intermittently get failing results)

I feel pretty confident that this change is inducing different behavior, but not in any way that makes sense. I assume there is still some yet to be understood bug (presumably in our code) and that this change has just jostled the machine code around in such a way that we're avoiding the bug more often.

I'll keep looking into this, but maybe not for a couple days.

@michaelkirk
Copy link
Member Author

michaelkirk commented Dec 22, 2021

I wonder if maybe it's static vs. dynamic linking that's causing the bug to present vs. not.

That could explain why --features bundled_proj (which statically links libproj) could succeed while linking to the system build (which links dynamically if available) of the exactly same build of proj (e.g. on proj-ci) could intermittently fail.

It doesn't help fix the bug, but at least it's a theory for how the two scenarios could be behaving differently.

@michaelkirk michaelkirk mentioned this pull request Feb 1, 2022
@michaelkirk
Copy link
Member Author

bors try

bors bot added a commit that referenced this pull request Feb 4, 2022
@bors
Copy link
Contributor

bors bot commented Feb 4, 2022

try

Build failed:

@michaelkirk
Copy link
Member Author

superseded by #118

bors bot added a commit that referenced this pull request Mar 21, 2022
118: update libproj to 9.0.0 r=michaelkirk a=michaelkirk

- Fixes #96
- supersedes #97 

Co-authored-by: Michael Kirk <michael.code@endoftheworl.de>
@michaelkirk michaelkirk deleted the mkirk/proj-8.2 branch April 11, 2022 19:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants