Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zlib and io_uring #40

Open
AlfieC opened this issue Dec 21, 2020 · 21 comments
Open

zlib and io_uring #40

AlfieC opened this issue Dec 21, 2020 · 21 comments

Comments

@AlfieC
Copy link

AlfieC commented Dec 21, 2020

hey,

we have a pretty large codebase so I'll try to pull out the most important parts

we essentially dropped in epoll support whereas before we used nio - no issue there.
we later deployed kernel 5 + io_uring with the io_uring module, and we started to have issues. we compress the network stream data with zlib (mostly implemented native to avoid copying bytebuf)

error flows from here: https://github.com/SpigotMC/BungeeCord/blob/master/native/src/main/c/NativeCompressImpl.cpp#L76

always errors showing -2. I would usually attribute this to a bug on our side, but the issue only surfaces when we put the "proxy" type server on io_uring, as epoll and nio work without issue. not sure what kind of logs you guys need but I can try to provide anything requested.

@normanmaurer
Copy link
Member

Can you show me how you call the code and the full stacktrace ? -2 is a valid error code (stream error)

Bye
Norman

@AlfieC
Copy link
Author

AlfieC commented Dec 21, 2020

these errors only originate when we use this in the pipeline:

	private final BungeeZlib zlib = CompressFactory.zlib.newInstance();

	@Override
	public void handlerAdded(ChannelHandlerContext ctx) throws Exception {
		zlib.init(true, Deflater.DEFAULT_COMPRESSION);
	}

	@Override
	public void handlerRemoved(ChannelHandlerContext ctx) throws Exception {
		zlib.free();
	}

	@Override
	protected void encode(ChannelHandlerContext ctx, ByteBuf msg, ByteBuf out) throws Exception {
		int origSize = msg.readableBytes();
		if (origSize < 256) {
			writeVarInt(0, out);
			out.writeBytes(msg);
		} else {
			writeVarInt(origSize, out);
			zlib.process(msg, out);
		}
	}

	public static void writeVarInt(int val, ByteBuf out) {
		while ((val & -128) != 0) {
			out.writeByte(val & 127 | 128);
			val >>>= 7;
		}

		out.writeByte(val);
	}

code from other side:

    private final int compressionThreshold;
    private final BungeeZlib zlib = CompressFactory.zlib.newInstance();

    @Override
    public void handlerAdded(ChannelHandlerContext ctx) throws Exception
    {
        zlib.init( false, 0 );
    }

    @Override
    public void handlerRemoved(ChannelHandlerContext ctx) throws Exception
    {
        zlib.free();
    }

    @Override
    protected void decode(ChannelHandlerContext ctx, ByteBuf in, List<Object> out) throws Exception
    {
        int size = DefinedPacket.readVarInt( in );
        if ( size == 0 )
        {
            out.add( in.slice().retain() );
            in.skipBytes( in.readableBytes() );
        } else
        {
            Preconditions.checkArgument( size >= compressionThreshold, "Decompressed size %s less than compression threshold %s", size, compressionThreshold);
            ByteBuf decompressed = ctx.alloc().directBuffer();

            try
            {
                zlib.process( in, decompressed );
                Preconditions.checkArgument( decompressed.readableBytes() == size, "Decompressed size %s is not equal to actual decompressed bytes", size, decompressed.readableBytes());

                out.add( decompressed );
                decompressed = null;
            } finally
            {
                if ( decompressed != null )
                {
                    decompressed.release();
                }
            }
        }
    }

apologies, error here is this one:

Preconditions.checkArgument( size >= compressionThreshold, "Decompressed size %s less than compression threshold %s", size, compressionThreshold);

I'd normally attribute this to an error of ours but it only occurs when we use io_uring - no issues using epoll or nio

@AlfieC
Copy link
Author

AlfieC commented Dec 21, 2020

when we remove the checkArgument, there we get the -2 on zlib decompression

@HookWoods
Copy link
Contributor

I've done some test a time ago, and it appears that the decoded buffer has multiple nio buffers, so that's why you got the -2 on Zlib cause it can't decompress multiple buffers

@AlfieC
Copy link
Author

AlfieC commented Dec 22, 2020

I've done some test a time ago, and it appears that the decoded buffer has multiple nio buffers, so that's why you got the -2 on Zlib cause it can't decompress multiple buffers

so is this a limitation of zlib? or bug?

@AlfieC
Copy link
Author

AlfieC commented Dec 24, 2020

I've done some test a time ago, and it appears that the decoded buffer has multiple nio buffers, so that's why you got the -2 on Zlib cause it can't decompress multiple buffers

thinking about this some more, im not sure why the issue only appears on io_uring - on epoll and nio no issue.

@HookWoods
Copy link
Contributor

Actually I've done some more test. I have a custom fork of BungeeCord https://github.com/SpigotMC/BungeeCord with IOUring and a custom fork of PaperSpigot (https://github.com/PaperMC/Paper) with IOUring on. I just try to launch the bungeecord server with io_uring on and the spigot with io_uring on and it's not working. I got this error from the Bungeecord logs
[21:35:32] [Netty io_uring Worker #0/INFO]: [HookWood_] disconnected with: Exception Connecting:DecoderException : net.md_5.bungee.jni.NativeCodeException: Unknown z_stream return code : -3 @ io.netty.handler.codec.MessageToMessageDecoder:98

When I launch the spigot server on Epoll, it works. So I don't know why and I'm going to search more things on it, but the zlib compression don't work with IOUring on BungeeCord and Spigot

@normanmaurer
Copy link
Member

Can you provide a reproducer that I can run locally ?

@normanmaurer
Copy link
Member

@AlfieC @HookWoods ping

@HookWoods
Copy link
Contributor

OK I will set up that when I'm at home (in 3-4h)

@Janmm14
Copy link

Janmm14 commented Sep 10, 2021

I've done some test a time ago, and it appears that the decoded buffer has multiple nio buffers, so that's why you got the -2 on Zlib cause it can't decompress multiple buffers

While CompositeByteBuf exists in netty it only gives us a native address when it just has one component, else it errors. So that cannot be the problem.
It is not clear if this is a netty bug or a bug in bungeecord's zlib usage.
Bungee's native zlib got an overhaul since the last comment in here in bungeecord, so the issue creator should check it again as well.

I'd suggest to close this issue.

@chrisvest
Copy link
Collaborator

It sounds like the only variable between working and non-working systems is the io_uring transport, and in this case the error shows up as a corrupted (I guess) zlib stream. It could be that the io_uring transport doesn't set correct read- or write-offsets on the buffers in some cases, and it just happens to get caught by zlib because it sanity checks the data it gets.

@rafi67000
Copy link

any update on this?

@PedroMPagani
Copy link

This seems to still be an issue.

@rafi67000
Copy link

rafi67000 commented Nov 16, 2024

It's fixed in Netty 4.2.0 (Tested using Beta1)

@AlfieC
Copy link
Author

AlfieC commented Nov 16, 2024

sorry to get back to this years later. we fixed it by enabling compression one tick after login

@Janmm14
Copy link

Janmm14 commented Nov 16, 2024

sorry to get back to this years later. we fixed it by enabling compression one tick after login

That is not a solution, thats at most a quirky workaround which just so happens to work for you by accident.

@AlfieC
Copy link
Author

AlfieC commented Nov 16, 2024

with said solution we were able to hit 1.1k players on an instance, 3k-ish per bungee

@rafi67000
Copy link

sorry to get back to this years later. we fixed it by enabling compression one tick after login

That is not a solution, thats at most a quirky workaround which just so happens to work for you by accident.

this issue is fixed in netty 4.2.0, maybe the fix could just be backported here?

@PedroMPagani
Copy link

The reason it worked for him is because the first tick usually uses the most amount of packets, that's a per specific production for MINECRAFT, not any other software specifically aswell.

@AlfieC
Copy link
Author

AlfieC commented Nov 16, 2024

ah my bad! Was responding via email so thought this was the Bungee issue. apologies, yes, it was a workaround.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants