Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disallow _Atomic(uint64_t) in structs, since some C compilers have an alignment bug #56626

Open
NHDaly opened this issue Nov 20, 2024 · 11 comments
Labels
bug Indicates an unexpected problem or unintended behavior system:32-bit Affects only 32-bit systems

Comments

@NHDaly
Copy link
Member

NHDaly commented Nov 20, 2024

Per @vtjnash:

_Atomic(uint64_t) does not have correct alignment in many C compilers.
gcc is particularly bad about accounting for the size of this object when generating field accesses
[...]
the problem is internal to gcc it sees atomic and then chooses to disregard the alignment
it was fixed eventually, just not on our CI versions

This can result in segfaults and other issues (on 32-bit machines only?).

Currently, we have atomic uint64s in these two structs:

typedef struct jl_timing_counts_event_t {
    const char *name;
    _Atomic(uint64_t) self;
    _Atomic(uint64_t) total;
} jl_timing_counts_event_t;

typedef struct {
    _Atomic(int64_t) allocd;
    _Atomic(int64_t) pool_live_bytes;
    _Atomic(uint64_t) malloc;
    _Atomic(uint64_t) realloc;
    _Atomic(uint64_t) poolalloc;
    _Atomic(uint64_t) bigalloc;
    _Atomic(int64_t) free_acc;
    _Atomic(uint64_t) alloc_acc;
} jl_thread_gc_num_common_t;

We should change these to 32-bit uints if we can, or find some other encoding if not.

Additionally, we should make a static analyzer rule that disallows this for the future.
A clang-tidy plugin could check this for us.

@NHDaly NHDaly added bug Indicates an unexpected problem or unintended behavior system:32-bit Affects only 32-bit systems labels Nov 20, 2024
@NHDaly
Copy link
Member Author

NHDaly commented Nov 20, 2024

For example, here is a CI failure we saw on a PR (#56320) that introduced new _Atomic(uint64_t) fields:

[11354] signal 11 (1): Segmentation fault
in expression starting at boot.jl:264
_jl_mutex_lock at /cache/build/builder-amdci4-5/julialang/julia-master/src/threading.c:958
jl_mutex_lock at /cache/build/builder-amdci4-5/julialang/julia-master/src/julia_locks.h:65 [inlined]
jl_generate_fptr_for_unspecialized_impl at /cache/build/builder-amdci4-5/julialang/julia-master/src/jitlayers.cpp:865
jl_compile_method_internal at /cache/build/builder-amdci4-5/julialang/julia-master/src/gf.c:2857
_jl_invoke at /cache/build/builder-amdci4-5/julialang/julia-master/src/gf.c:3298 [inlined]
ijl_invoke at /cache/build/builder-amdci4-5/julialang/julia-master/src/gf.c:3313
jl_toplevel_eval_flex at /cache/build/builder-amdci4-5/julialang/julia-master/src/toplevel.c:1056
jl_parse_eval_all at /cache/build/builder-amdci4-5/julialang/julia-master/src/toplevel.c:1192
ijl_load_ at /cache/build/builder-amdci4-5/julialang/julia-master/src/toplevel.c:1239
ijl_load at /cache/build/builder-amdci4-5/julialang/julia-master/src/toplevel.c:1252
_finish_julia_init at /cache/build/builder-amdci4-5/julialang/julia-master/src/init.c:881
julia_init at /cache/build/builder-amdci4-5/julialang/julia-master/src/init.c:846

https://buildkite.com/julialang/julia-master/builds/42153#_

@vchuravy
Copy link
Member

Additionally, we should make a static analyzer rule that disallows this for the future.

Can we not do the other way? Disallow C compilers that don't support this?

@NHDaly
Copy link
Member Author

NHDaly commented Nov 20, 2024

That would make sense to me.

@vtjnash?

@oscardssmith
Copy link
Member

iiuc that would mean disallowing GCC entirely. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65146 is still open

@Keno
Copy link
Member

Keno commented Nov 21, 2024

I believe the status is just wrong and this got fixed in the commit linked at the bottom of that issue.

@Keno
Copy link
Member

Keno commented Nov 21, 2024

In the meantime, why not just manually force the padding and set the alignment?

@NHDaly
Copy link
Member Author

NHDaly commented Nov 21, 2024

In the meantime, why not just manually force the padding and set the alignment?

@Keno: that would be excellent if it would work in all cases. Do you mean something like this?:

    alignas(8) _Atomic(uint64_t) cpu_time_ns;

or even this?:

    alignas(8) _Atomic(uint64_t) cpu_time_ns : 64;

Don't we need to worry about _Atomic() introducing its own alignment and/or padding requirements on some architectures?
Like what if it needs to insert a lock on some architectures? Or have like 4-word alignment or something?
Do we know that we only run on any architectures where this is safe?

CC: @d-netto

(PS I'm trying this in a commit, here)

@Keno
Copy link
Member

Keno commented Nov 21, 2024

@Keno: that would be excellent if it would work in all cases. Do you mean something like this?:

If that works, that'd be great. From @vtjnash explanation it sounded like it wouldn't, in which case I would have suggested aligning the outer struct. That said, from the commit that fixed it, I don't see anything to suggest that the attribute would be ignored (of course it's possible the surrounding code would do something like that).

@NHDaly
Copy link
Member Author

NHDaly commented Nov 21, 2024

Huh, it does seem to have fixed the segfault in CI, anyway! 👀 Is that enough to consider this fixed, then?

@NHDaly
Copy link
Member Author

NHDaly commented Nov 21, 2024

(This also seems to be what Stackoverflow says)

@NHDaly
Copy link
Member Author

NHDaly commented Nov 22, 2024

🤔 but the tests are failing on i686 (32-bit), which makes me think that actually forcing the alignment just causes us to read/write garbage even though it doesn't crash? 😞

Or maybe the issue is that now the struct layouts don't match between julia and C?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Indicates an unexpected problem or unintended behavior system:32-bit Affects only 32-bit systems
Projects
None yet
Development

No branches or pull requests

4 participants