EventHub receiver.listenForMessage slows down significantly #240

ChiefAlexander · 2021-10-22T15:09:17Z

Expected Behavior

Receive messages consistently

Actual Behavior

After a few seconds of receiving messages, the latency increases drastically and doesn't come down

I have attached a screenshot from Jaeger showing the increase in function times from an average of ~40ms to an average of ~600ms after processing less than 200 messages from EventHub. This all happens within a few seconds of startup.

I have locally patched the AMQP library trying to dig into the issue to identify where this slow down is occurring but was unable to reach any conclusions. If this is an issue within the AMQP library I am happy to open another issue there but wanted to start here to see if anyone had any ideas.

Environment

OS: gcr.io/distroless/static:nonroot
Go version: 1.17.1
Version of Library:
github.com/Azure/azure-event-hubs-go/v3 v3.3.16
github.com/Azure/go-amqp v0.16.1

Workaround

Utilizing the WebSocket connection option allowed us to have consistent message delivery. After switching our latency within the functions maintains a stable execution time of ~40ms. Which 10x'd our data throughput.

ChiefAlexander · 2021-10-22T15:11:50Z

This is also probably related to #122 based on some of the comments.

richardpark-msft · 2021-12-02T21:15:03Z

Hi @ChiefAlexander,

I've written a stress program, which ran for 24 hours and not seen this issue so I think I'm missing some key component. Do you have a second to look at what I have to see if matches what you're doing? Or even better, if you can create a small sample that replicates the problem you're seeing I could start with that.

Some things that might be different:

I'm only writing/reading to a single partition
I'm not using any of the checkpoint/leasing code - just purely reading from an offset, continually.
I'm using an alpine base image (I imagine this is similar to what you're using but yours is probably even more lean)

stress program

ChiefAlexander · 2021-12-03T15:17:21Z

I will try to replicate what we are seeing with your stress program.

A few notes:

We are reading from multiple partitions in some cases (4 partitions). I believe that when I was running the above tests it was only reading from a single of those partitions.
We are using a checkpoint but not a lease. We have actually written our own persister that writes to Firestore. However, while testing we confirmed that this would occur when not using a checkpoint and when using the default persister as a part of this library.
We were able to replicate this both in a container and locally run (mac)

I think we could argue about which base image is leaner 😄 . I don't think in this case the underlying OS actually matters based on our own testing.

slamgundam · 2021-12-23T19:44:19Z

can i ask where i can switch the connection option to websocket?

richardpark-msft mentioned this issue Dec 2, 2021

Event hubs stop receiving data #122

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

EventHub receiver.listenForMessage slows down significantly #240

EventHub receiver.listenForMessage slows down significantly #240

ChiefAlexander commented Oct 22, 2021

ChiefAlexander commented Oct 22, 2021

richardpark-msft commented Dec 2, 2021 •

edited

Loading

ChiefAlexander commented Dec 3, 2021

slamgundam commented Dec 23, 2021

EventHub receiver.listenForMessage slows down significantly #240

EventHub receiver.listenForMessage slows down significantly #240

Comments

ChiefAlexander commented Oct 22, 2021

Expected Behavior

Actual Behavior

Environment

Workaround

ChiefAlexander commented Oct 22, 2021

richardpark-msft commented Dec 2, 2021 • edited Loading

ChiefAlexander commented Dec 3, 2021

slamgundam commented Dec 23, 2021

richardpark-msft commented Dec 2, 2021 •

edited

Loading