Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible memory leak in rails-autoscale-web #65

Open
carrabi opened this issue Sep 23, 2022 · 15 comments
Open

Possible memory leak in rails-autoscale-web #65

carrabi opened this issue Sep 23, 2022 · 15 comments

Comments

@carrabi
Copy link

carrabi commented Sep 23, 2022

I recently migrated from rails_autoscale_agent to rails-autoscale-web with rails-autoscale-sidekiq. I've been testing the update in my development environment to ensure things are working properly. Over the last two days I've noticed a jump in web dyno memory consumption. The screenshot below shows the span when I added the gem and rolled back to the old version. After the rollback, memory consumption returned to normal. Note - I didn't experience the memory jump on my worker dynos which indicates an issue with rails-autoscale-web gem.

Heroku

I also found this issue which references a memory leak in the rails_autoscale_agent which has been closed. I tried using heapy to get the heap dump but I was getting errors running it. I'm also not sure how heapy would work in Heroku because its a shared virtualized dyno

#39

@adamlogic
Copy link
Collaborator

Thanks for opening the issue and with so much detail!

I assume your development environment does not have the Rails Autoscale add-on installed?

What I think is happening is the request middleware is collecting request queue time in memory, but since it's never reporting (if the add-on isn't installed), that in-memory store is never flushed.

I'm still working out the best way to fix it, but I'm betting if you installed the free version of Rails Autoscale in development environment, you won't see the memory leak.

@carrabi
Copy link
Author

carrabi commented Sep 23, 2022

Hi @adamlogic. Sure. Happy to help. The add-on has been a huge help for us.

I do have the Rails Autoscale add-on installed; however, I have the "Autoscaling" toggled off for my web and worker dynos so it wouldn't scale the infrastructure at night because it's a dev env. With auto scaling toggled off, would it behave the same as if the add-on wasn't installed?

@adamlogic
Copy link
Collaborator

With auto scaling toggled off, would it behave the same as if the add-on wasn't installed?

No, it shouldn't matter whether it's toggled on/off. Just having it installed disproves my theory above. 🤔

@adamlogic
Copy link
Collaborator

What I think is happening is the request middleware is collecting request queue time in memory, but since it's never reporting (if the add-on isn't installed), that in-memory store is never flushed.

We do have a safeguard in place for this scenario, so this doesn't appear to be the issue.

@carrabi Could you check your logs to see if any exceptions are being raised by the gems? The easiest way to do this is to tail & grep your logs, then restart your app to see the logs during boot:

heroku logs -t -d web -a [app-name] | grep 'RailsAutoscale\|rails-autoscale'
heroku ps:restart -d web -a [app-name]

Feel free to email me these logs privately. Adam at railsautoscale.com. Thanks!

@carrabi
Copy link
Author

carrabi commented Sep 26, 2022

Yep. Will do

@jnebeker
Copy link

Hello! Just wanted to report that we're seeing the same web memory consumption increase after switching to rails-autoscale-web (no memory issues with worker processes). This is what I see in our application logs when restarting the app:

2022-09-26T19:58:08.678265+00:00 app[web.1]: I, [2022-09-26T19:58:08.672175 #3]  INFO -- : [RailsAutoscale] Preparing request middleware
2022-09-26T19:58:12.456215+00:00 app[web.1]: I, [2022-09-26T19:58:12.456095 #3]  INFO -- : [RailsAutoscale] Reporter starting, will report every 10 seconds or so. Adapters: [rails-autoscale-core, rails-autoscale-sidekiq, rails-autoscale-web]
2022-09-26T19:58:12.456351+00:00 app[web.1]: I, [2022-09-26T19:58:12.456285 #3]  INFO -- : [RailsAutoscale] Sidekiq enabled
2022-09-26T19:58:12.948030+00:00 app[web.1]: I, [2022-09-26T19:58:12.947964 #3]  INFO -- : [RailsAutoscale] Reporting 22 metrics

Let me know if that's any other information I can give to assist!

@adamlogic
Copy link
Collaborator

@jnebeker Thanks for the report! Any chance you have a before & after screenshot of Heroku memory metrics?

Also, if you're able to search your logs for "rails-autoscale" (not "RailsAutoscale"), are you seeing any exceptions/backtraces?

@carrabi
Copy link
Author

carrabi commented Sep 27, 2022

@adamlogic Reporting back from our offline conversation (I'll recap below). Great catch on running puma in "cluster mode" with a single process. Once I switched puma into "single mode" I no longer experienced the memory leak. I also, let these changes sit for 24 hours. As expected, switching to "single mode" resulted in a drop in overall memory consumption and an increase in response time (no thread concurrency). Overall my app performance remained unchanged and I plan on rolling this out to my production environment after further monitoring. What wasn't expected is how much the memory would drop. It was roughly 50%. This article does a good job explaining how puma operates at a high level. https://medium.com/@CGA1123/reducing-rails-memory-usage-by-15-56090b6294bd

Screen Shot 2022-09-27 at 10 22 04 AM

What still remains unanswered is running puma in "cluster mode". I still believe there is an issue there.

For those of you who what to try and run puma in single mode, you'll need to comment out the following lines in puma.rb

  • preload_app!
  • workers
  • before_fork
  • on_worker_boot

One gotcha that I ran into. If you set WEB_CONCURRENCY for the workers as an ENV and its > 0, puma will continue to run in "cluster mode". I believe puma looks for this env and will switch modes based on the value regardless of not having workers defined in puma.rb. The other option would be to hardcode the workers value to 0. If you want more information what tipped me off to the WEB_CONCURRENCY setting see here. puma/puma#2393

@jnebeker
Copy link

@jnebeker Thanks for the report! Any chance you have a before & after screenshot of Heroku memory metrics?

Also, if you're able to search your logs for "rails-autoscale" (not "RailsAutoscale"), are you seeing any exceptions/backtraces?

Sure thing @adamlogic, below are some screenshots of Heroku memory metrics in our telemetry tool as well as a screenshot of Heroku memory metrics. Each of the dotted lines in the telemetry screenshots is a dyno restart from a new release.

Screenshots image

Web dyno memory usage 24 hours before (left) and after(right) we switched from rails_autoscale_agent to rails-autoscale-web


image

Web dyno memory usage 24 hours before (left) and after(right) we switched back to rails_autoscale_agent from rails-autoscale-web


image

Heroku web memory metrics over the past 72 hours, the 51be6fa8 release was when we reverted back to rails_autoscale_agent


I searched our logs for rails-autoscale and see a few different types of exceptions, though there were no exceptions when booting the app when I followed the steps in this comment. I'll give the full stack trace for each in the details summary.

Net::ReadTimeout
Sep 22 10:53:50Z app/web.1 E, [2022-09-22T10:53:50.022864 #79] ERROR -- : [RailsAutoscale] Reporter error: #<Net::ReadTimeout: Net::ReadTimeout>
Sep 22 10:53:50Z app/web.1 [RailsAutoscale] /app/vendor/ruby-2.7.5/lib/ruby/2.7.0/net/protocol.rb:217:in `rbuf_fill'
Sep 22 10:53:50Z app/web.1 [RailsAutoscale] /app/vendor/ruby-2.7.5/lib/ruby/2.7.0/net/protocol.rb:191:in `readuntil'
Sep 22 10:53:50Z app/web.1 [RailsAutoscale] /app/vendor/ruby-2.7.5/lib/ruby/2.7.0/net/protocol.rb:201:in `readline'
Sep 22 10:53:50Z app/web.1 [RailsAutoscale] /app/vendor/ruby-2.7.5/lib/ruby/2.7.0/net/http/response.rb:42:in `read_status_line'
Sep 22 10:53:50Z app/web.1 [RailsAutoscale] /app/vendor/ruby-2.7.5/lib/ruby/2.7.0/net/http/response.rb:31:in `read_new'
Sep 22 10:53:50Z app/web.1 [RailsAutoscale] /app/vendor/ruby-2.7.5/lib/ruby/2.7.0/net/http.rb:1528:in `block in transport_request'
Sep 22 10:53:50Z app/web.1 [RailsAutoscale] /app/vendor/ruby-2.7.5/lib/ruby/2.7.0/net/http.rb:1519:in `catch'
Sep 22 10:53:50Z app/web.1 [RailsAutoscale] /app/vendor/ruby-2.7.5/lib/ruby/2.7.0/net/http.rb:1519:in `transport_request'
Sep 22 10:53:50Z app/web.1 [RailsAutoscale] /app/vendor/ruby-2.7.5/lib/ruby/2.7.0/net/http.rb:1492:in `request'
Sep 22 10:53:50Z app/web.1 [RailsAutoscale] /app/vendor/bundle/ruby/2.7.0/gems/newrelic_rpm-8.10.1/lib/new_relic/agent/instrumentation/net_http/prepend.rb:16:in `block in request'
Sep 22 10:53:50Z app/web.1 [RailsAutoscale] /app/vendor/bundle/ruby/2.7.0/gems/newrelic_rpm-8.10.1/lib/new_relic/agent/instrumentation/net_http/instrumentation.rb:27:in `block (2 levels) in request_with_tracing'
Sep 22 10:53:50Z app/web.1 [RailsAutoscale] /app/vendor/bundle/ruby/2.7.0/gems/newrelic_rpm-8.10.1/lib/new_relic/agent/tracer.rb:352:in `capture_segment_error'
Sep 22 10:53:50Z app/web.1 [RailsAutoscale] /app/vendor/bundle/ruby/2.7.0/gems/newrelic_rpm-8.10.1/lib/new_relic/agent/instrumentation/net_http/instrumentation.rb:26:in `block in request_with_tracing'
Sep 22 10:53:50Z app/web.1 [RailsAutoscale] /app/vendor/bundle/ruby/2.7.0/gems/newrelic_rpm-8.10.1/lib/new_relic/agent.rb:507:in `disable_all_tracing'
Sep 22 10:53:50Z app/web.1 [RailsAutoscale] /app/vendor/bundle/ruby/2.7.0/gems/newrelic_rpm-8.10.1/lib/new_relic/agent/instrumentation/net_http/instrumentation.rb:25:in `request_with_tracing'
Sep 22 10:53:50Z app/web.1 [RailsAutoscale] /app/vendor/bundle/ruby/2.7.0/gems/newrelic_rpm-8.10.1/lib/new_relic/agent/instrumentation/net_http/prepend.rb:16:in `request'
Sep 22 10:53:50Z app/web.1 [RailsAutoscale] /app/vendor/bundle/ruby/2.7.0/gems/rails-autoscale-core-1.0.1/lib/rails_autoscale/adapter_api.rb:39:in `block in post_raw'
Sep 22 10:53:50Z app/web.1 [RailsAutoscale] /app/vendor/ruby-2.7.5/lib/ruby/2.7.0/net/http.rb:933:in `start'
Sep 22 10:53:50Z app/web.1 [RailsAutoscale] /app/vendor/ruby-2.7.5/lib/ruby/2.7.0/net/http.rb:606:in `start'
Sep 22 10:53:50Z app/web.1 [RailsAutoscale] /app/vendor/bundle/ruby/2.7.0/gems/rails-autoscale-core-1.0.1/lib/rails_autoscale/adapter_api.rb:34:in `post_raw'
Sep 22 10:53:50Z app/web.1 [RailsAutoscale] /app/vendor/bundle/ruby/2.7.0/gems/rails-autoscale-core-1.0.1/lib/rails_autoscale/adapter_api.rb:26:in `post_json'
Sep 22 10:53:50Z app/web.1 [RailsAutoscale] /app/vendor/bundle/ruby/2.7.0/gems/rails-autoscale-core-1.0.1/lib/rails_autoscale/adapter_api.rb:19:in `report_metrics'
Sep 22 10:53:50Z app/web.1 [RailsAutoscale] /app/vendor/bundle/ruby/2.7.0/gems/rails-autoscale-core-1.0.1/lib/rails_autoscale/reporter.rb:81:in `report'
Sep 22 10:53:50Z app/web.1 [RailsAutoscale] /app/vendor/bundle/ruby/2.7.0/gems/rails-autoscale-core-1.0.1/lib/rails_autoscale/reporter.rb:63:in `block in run_metrics_collection'
Sep 22 10:53:50Z app/web.1 [RailsAutoscale] /app/vendor/bundle/ruby/2.7.0/gems/rails-autoscale-core-1.0.1/lib/rails_autoscale/reporter.rb:92:in `log_exceptions'
Sep 22 10:53:50Z app/web.1 [RailsAutoscale] /app/vendor/bundle/ruby/2.7.0/gems/rails-autoscale-core-1.0.1/lib/rails_autoscale/reporter.rb:63:in `run_metrics_collection'
Sep 22 10:53:50Z app/web.1 [RailsAutoscale] /app/vendor/bundle/ruby/2.7.0/gems/rails-autoscale-core-1.0.1/lib/rails_autoscale/reporter.rb:49:in `block (2 levels) in run_loop'
Sep 22 10:53:50Z app/web.1 [RailsAutoscale] /app/vendor/bundle/ruby/2.7.0/gems/rails-autoscale-core-1.0.1/lib/rails_autoscale/reporter.rb:48:in `loop'
Sep 22 10:53:50Z app/web.1 [RailsAutoscale] /app/vendor/bundle/ruby/2.7.0/gems/rails-autoscale-core-1.0.1/lib/rails_autoscale/reporter.rb:48:in `block in run_loop'
Sep 22 10:53:50Z app/web.1 [RailsAutoscale] /app/vendor/bundle/ruby/2.7.0/gems/logging-2.3.0/lib/logging/diagnostic_context.rb:474:in `block in create_with_logging_context'
OpenSSL::SSL::SSLError
Sep 25 21:12:36Z app/web.1 E, [2022-09-25T21:12:36.115125 #3] ERROR -- : [RailsAutoscale] Reporter error: #<OpenSSL::SSL::SSLError: SSL_read: sslv3 alert bad record mac>
Sep 25 21:12:36Z app/web.1 [RailsAutoscale] /app/vendor/ruby-2.7.5/lib/ruby/2.7.0/openssl/buffering.rb:182:in `sysread_nonblock'
Sep 25 21:12:36Z app/web.1 [RailsAutoscale] /app/vendor/ruby-2.7.5/lib/ruby/2.7.0/openssl/buffering.rb:182:in `read_nonblock'
Sep 25 21:12:36Z app/web.1 [RailsAutoscale] /app/vendor/bundle/ruby/2.7.0/gems/redis-4.8.0/lib/redis/connection/ruby.rb:55:in `block in _read_from_socket'
Sep 25 21:12:36Z app/web.1 [RailsAutoscale] /app/vendor/bundle/ruby/2.7.0/gems/redis-4.8.0/lib/redis/connection/ruby.rb:54:in `loop'
Sep 25 21:12:36Z app/web.1 [RailsAutoscale] /app/vendor/bundle/ruby/2.7.0/gems/redis-4.8.0/lib/redis/connection/ruby.rb:54:in `_read_from_socket'
Sep 25 21:12:36Z app/web.1 [RailsAutoscale] /app/vendor/bundle/ruby/2.7.0/gems/redis-4.8.0/lib/redis/connection/ruby.rb:47:in `gets'
Sep 25 21:12:36Z app/web.1 [RailsAutoscale] /app/vendor/bundle/ruby/2.7.0/gems/redis-4.8.0/lib/redis/connection/ruby.rb:382:in `read'
Sep 25 21:12:36Z app/web.1 [RailsAutoscale] /app/vendor/bundle/ruby/2.7.0/gems/redis-4.8.0/lib/redis/client.rb:311:in `block in read'
Sep 25 21:12:36Z app/web.1 [RailsAutoscale] /app/vendor/bundle/ruby/2.7.0/gems/redis-4.8.0/lib/redis/client.rb:299:in `io'
Sep 25 21:12:36Z app/web.1 [RailsAutoscale] /app/vendor/bundle/ruby/2.7.0/gems/redis-4.8.0/lib/redis/client.rb:310:in `read'
Sep 25 21:12:36Z app/web.1 [RailsAutoscale] /app/vendor/bundle/ruby/2.7.0/gems/redis-4.8.0/lib/redis/client.rb:161:in `block in call'
Sep 25 21:12:36Z app/web.1 [RailsAutoscale] /app/vendor/bundle/ruby/2.7.0/gems/redis-4.8.0/lib/redis/client.rb:279:in `block (2 levels) in process'
Sep 25 21:12:36Z app/web.1 [RailsAutoscale] /app/vendor/bundle/ruby/2.7.0/gems/redis-4.8.0/lib/redis/client.rb:420:in `ensure_connected'
Sep 25 21:12:36Z app/web.1 [RailsAutoscale] /app/vendor/bundle/ruby/2.7.0/gems/redis-4.8.0/lib/redis/client.rb:269:in `block in process'
Sep 25 21:12:36Z app/web.1 [RailsAutoscale] /app/vendor/bundle/ruby/2.7.0/gems/redis-4.8.0/lib/redis/client.rb:356:in `logging'
Sep 25 21:12:36Z app/web.1 [RailsAutoscale] /app/vendor/bundle/ruby/2.7.0/gems/redis-4.8.0/lib/redis/client.rb:268:in `process'
Sep 25 21:12:36Z app/web.1 [RailsAutoscale] /app/vendor/bundle/ruby/2.7.0/gems/honeycomb-beeline-2.11.0/lib/honeycomb/integrations/redis.rb:107:in `process'
Sep 25 21:12:36Z app/web.1 [RailsAutoscale] /app/vendor/bundle/ruby/2.7.0/gems/redis-4.8.0/lib/redis/client.rb:161:in `call'
Sep 25 21:12:36Z app/web.1 [RailsAutoscale] /app/vendor/bundle/ruby/2.7.0/gems/newrelic_rpm-8.10.1/lib/new_relic/agent/instrumentation/redis/prepend.rb:12:in `block in call'
Sep 25 21:12:36Z app/web.1 [RailsAutoscale] /app/vendor/bundle/ruby/2.7.0/gems/newrelic_rpm-8.10.1/lib/new_relic/agent/instrumentation/redis/instrumentation.rb:19:in `block in call_with_tracing'
Sep 25 21:12:36Z app/web.1 [RailsAutoscale] /app/vendor/bundle/ruby/2.7.0/gems/newrelic_rpm-8.10.1/lib/new_relic/agent/instrumentation/redis/instrumentation.rb:45:in `block in with_tracing'
Sep 25 21:12:36Z app/web.1 [RailsAutoscale] /app/vendor/bundle/ruby/2.7.0/gems/newrelic_rpm-8.10.1/lib/new_relic/agent/tracer.rb:352:in `capture_segment_error'
Sep 25 21:12:36Z app/web.1 [RailsAutoscale] /app/vendor/bundle/ruby/2.7.0/gems/newrelic_rpm-8.10.1/lib/new_relic/agent/instrumentation/redis/instrumentation.rb:45:in `with_tracing'
Sep 25 21:12:36Z app/web.1 [RailsAutoscale] /app/vendor/bundle/ruby/2.7.0/gems/newrelic_rpm-8.10.1/lib/new_relic/agent/instrumentation/redis/instrumentation.rb:19:in `call_with_tracing'
Sep 25 21:12:36Z app/web.1 [RailsAutoscale] /app/vendor/bundle/ruby/2.7.0/gems/newrelic_rpm-8.10.1/lib/new_relic/agent/instrumentation/redis/prepend.rb:12:in `call'
Sep 25 21:12:36Z app/web.1 [RailsAutoscale] /app/vendor/bundle/ruby/2.7.0/gems/redis-4.8.0/lib/redis.rb:270:in `block in send_command'
Sep 25 21:12:36Z app/web.1 [RailsAutoscale] /app/vendor/bundle/ruby/2.7.0/gems/redis-4.8.0/lib/redis.rb:269:in `synchronize'
Sep 25 21:12:36Z app/web.1 [RailsAutoscale] /app/vendor/bundle/ruby/2.7.0/gems/redis-4.8.0/lib/redis.rb:269:in `send_command'
Sep 25 21:12:36Z app/web.1 [RailsAutoscale] /app/vendor/bundle/ruby/2.7.0/gems/redis-4.8.0/lib/redis/commands/keys.rb:451:in `_scan'
Sep 25 21:12:36Z app/web.1 [RailsAutoscale] /app/vendor/bundle/ruby/2.7.0/gems/redis-4.8.0/lib/redis/commands/sets.rb:197:in `sscan'
Sep 25 21:12:36Z app/web.1 [RailsAutoscale] /app/vendor/bundle/ruby/2.7.0/gems/redis-4.8.0/lib/redis/commands/sets.rb:216:in `block in sscan_each'
Sep 25 21:12:36Z app/web.1 [RailsAutoscale] /app/vendor/bundle/ruby/2.7.0/gems/redis-4.8.0/lib/redis/commands/sets.rb:215:in `loop'
Sep 25 21:12:36Z app/web.1 [RailsAutoscale] /app/vendor/bundle/ruby/2.7.0/gems/redis-4.8.0/lib/redis/commands/sets.rb:215:in `sscan_each'
Sep 25 21:12:36Z app/web.1 [RailsAutoscale] /app/vendor/bundle/ruby/2.7.0/gems/sidekiq-6.5.7/lib/sidekiq/api.rb:250:in `each'
Sep 25 21:12:36Z app/web.1 [RailsAutoscale] /app/vendor/bundle/ruby/2.7.0/gems/sidekiq-6.5.7/lib/sidekiq/api.rb:250:in `to_a'
Sep 25 21:12:36Z app/web.1 [RailsAutoscale] /app/vendor/bundle/ruby/2.7.0/gems/sidekiq-6.5.7/lib/sidekiq/api.rb:250:in `block in all'
Sep 25 21:12:36Z app/web.1 [RailsAutoscale] /app/vendor/bundle/ruby/2.7.0/gems/sidekiq-6.5.7/lib/sidekiq.rb:164:in `block in redis'
Sep 25 21:12:36Z app/web.1 [RailsAutoscale] /app/vendor/bundle/ruby/2.7.0/gems/connection_pool-2.2.5/lib/connection_pool.rb:63:in `block (2 levels) in with'
Sep 25 21:12:36Z app/web.1 [RailsAutoscale] /app/vendor/bundle/ruby/2.7.0/gems/connection_pool-2.2.5/lib/connection_pool.rb:62:in `handle_interrupt'
Sep 25 21:12:36Z app/web.1 [RailsAutoscale] /app/vendor/bundle/ruby/2.7.0/gems/connection_pool-2.2.5/lib/connection_pool.rb:62:in `block in with'
Sep 25 21:12:36Z app/web.1 [RailsAutoscale] /app/vendor/bundle/ruby/2.7.0/gems/connection_pool-2.2.5/lib/connection_pool.rb:59:in `handle_interrupt'
Sep 25 21:12:36Z app/web.1 [RailsAutoscale] /app/vendor/bundle/ruby/2.7.0/gems/connection_pool-2.2.5/lib/connection_pool.rb:59:in `with'
Sep 25 21:12:36Z app/web.1 [RailsAutoscale] /app/vendor/bundle/ruby/2.7.0/gems/sidekiq-6.5.7/lib/sidekiq.rb:161:in `redis'
Sep 25 21:12:36Z app/web.1 [RailsAutoscale] /app/vendor/bundle/ruby/2.7.0/gems/sidekiq-6.5.7/lib/sidekiq/api.rb:250:in `all'
Sep 25 21:12:36Z app/web.1 [RailsAutoscale] /app/vendor/bundle/ruby/2.7.0/gems/rails-autoscale-sidekiq-1.0.1/lib/rails_autoscale/sidekiq/metrics_collector.rb:15:in `collect'
Sep 25 21:12:36Z app/web.1 [RailsAutoscale] /app/vendor/bundle/ruby/2.7.0/gems/rails-autoscale-core-1.0.1/lib/rails_autoscale/reporter.rb:60:in `block (2 levels) in run_metrics_collection'
Sep 25 21:12:36Z app/web.1 [RailsAutoscale] /app/vendor/bundle/ruby/2.7.0/gems/rails-autoscale-core-1.0.1/lib/rails_autoscale/reporter.rb:92:in `log_exceptions'
Sep 25 21:12:36Z app/web.1 [RailsAutoscale] /app/vendor/bundle/ruby/2.7.0/gems/rails-autoscale-core-1.0.1/lib/rails_autoscale/reporter.rb:60:in `block in run_metrics_collection'
Sep 25 21:12:36Z app/web.1 [RailsAutoscale] /app/vendor/bundle/ruby/2.7.0/gems/rails-autoscale-core-1.0.1/lib/rails_autoscale/reporter.rb:59:in `each'
Sep 25 21:12:36Z app/web.1 [RailsAutoscale] /app/vendor/bundle/ruby/2.7.0/gems/rails-autoscale-core-1.0.1/lib/rails_autoscale/reporter.rb:59:in `flat_map'
Sep 25 21:12:36Z app/web.1 [RailsAutoscale] /app/vendor/bundle/ruby/2.7.0/gems/rails-autoscale-core-1.0.1/lib/rails_autoscale/reporter.rb:59:in `run_metrics_collection'
Sep 25 21:12:36Z app/web.1 [RailsAutoscale] /app/vendor/bundle/ruby/2.7.0/gems/rails-autoscale-core-1.0.1/lib/rails_autoscale/reporter.rb:49:in `block (2 levels) in run_loop'
Sep 25 21:12:36Z app/web.1 [RailsAutoscale] /app/vendor/bundle/ruby/2.7.0/gems/rails-autoscale-core-1.0.1/lib/rails_autoscale/reporter.rb:48:in `loop'
Sep 25 21:12:36Z app/web.1 [RailsAutoscale] /app/vendor/bundle/ruby/2.7.0/gems/rails-autoscale-core-1.0.1/lib/rails_autoscale/reporter.rb:48:in `block in run_loop'
Sep 25 21:12:36Z app/web.1 [RailsAutoscale] /app/vendor/bundle/ruby/2.7.0/gems/logging-2.3.0/lib/logging/diagnostic_context.rb:474:in `block in create_with_logging_context'
Sep 25 21:12:36Z app/web.1 I, [2022-09-25T21:12:36.115292 #3]  INFO -- : [RailsAutoscale] Reporting 0 metrics
Net::OpenTimeout
Sep 27 01:30:46 app/web.1 E, [2022-09-27T01:30:46.125348 #3] ERROR -- : [RailsAutoscale] Reporter error: #<Net::OpenTimeout: execution expired>
Sep 27 01:30:46 app/web.1 [RailsAutoscale] /app/vendor/ruby-2.7.5/lib/ruby/2.7.0/net/http.rb:960:in `initialize'
Sep 27 01:30:46 app/web.1 [RailsAutoscale] /app/vendor/ruby-2.7.5/lib/ruby/2.7.0/net/http.rb:960:in `open'
Sep 27 01:30:46 app/web.1 [RailsAutoscale] /app/vendor/ruby-2.7.5/lib/ruby/2.7.0/net/http.rb:960:in `block in connect'
Sep 27 01:30:46 app/web.1 [RailsAutoscale] /app/vendor/ruby-2.7.5/lib/ruby/2.7.0/timeout.rb:105:in `timeout'
Sep 27 01:30:46 app/web.1 [RailsAutoscale] /app/vendor/ruby-2.7.5/lib/ruby/2.7.0/net/http.rb:958:in `connect'
Sep 27 01:30:46 app/web.1 [RailsAutoscale] /app/vendor/ruby-2.7.5/lib/ruby/2.7.0/net/http.rb:943:in `do_start'
Sep 27 01:30:46 app/web.1 [RailsAutoscale] /app/vendor/ruby-2.7.5/lib/ruby/2.7.0/net/http.rb:932:in `start'
Sep 27 01:30:46 app/web.1 [RailsAutoscale] /app/vendor/ruby-2.7.5/lib/ruby/2.7.0/net/http.rb:606:in `start'
Sep 27 01:30:46 app/web.1 [RailsAutoscale] /app/vendor/bundle/ruby/2.7.0/gems/rails-autoscale-core-1.0.0/lib/rails_autoscale/adapter_api.rb:33:in `post_raw'
Sep 27 01:30:46 app/web.1 [RailsAutoscale] /app/vendor/bundle/ruby/2.7.0/gems/rails-autoscale-core-1.0.0/lib/rails_autoscale/adapter_api.rb:26:in `post_json'
Sep 27 01:30:46 app/web.1 [RailsAutoscale] /app/vendor/bundle/ruby/2.7.0/gems/rails-autoscale-core-1.0.0/lib/rails_autoscale/adapter_api.rb:19:in `report_metrics'
Sep 27 01:30:46 app/web.1 [RailsAutoscale] /app/vendor/bundle/ruby/2.7.0/gems/rails-autoscale-core-1.0.0/lib/rails_autoscale/reporter.rb:81:in `report'
Sep 27 01:30:46 app/web.1 [RailsAutoscale] /app/vendor/bundle/ruby/2.7.0/gems/rails-autoscale-core-1.0.0/lib/rails_autoscale/reporter.rb:63:in `block in run_metrics_collection'
Sep 27 01:30:46 app/web.1 [RailsAutoscale] /app/vendor/bundle/ruby/2.7.0/gems/rails-autoscale-core-1.0.0/lib/rails_autoscale/reporter.rb:92:in `log_exceptions'
Sep 27 01:30:46 app/web.1 [RailsAutoscale] /app/vendor/bundle/ruby/2.7.0/gems/rails-autoscale-core-1.0.0/lib/rails_autoscale/reporter.rb:63:in `run_metrics_collection'
Sep 27 01:30:46 app/web.1 [RailsAutoscale] /app/vendor/bundle/ruby/2.7.0/gems/rails-autoscale-core-1.0.0/lib/rails_autoscale/reporter.rb:49:in `block (2 levels) in run_loop'
Sep 27 01:30:46 app/web.1 [RailsAutoscale] /app/vendor/bundle/ruby/2.7.0/gems/rails-autoscale-core-1.0.0/lib/rails_autoscale/reporter.rb:48:in `loop'
Sep 27 01:30:46 app/web.1 [RailsAutoscale] /app/vendor/bundle/ruby/2.7.0/gems/rails-autoscale-core-1.0.0/lib/rails_autoscale/reporter.rb:48:in `block in run_loop'
Sep 27 01:30:46 app/web.1 [RailsAutoscale] /app/vendor/bundle/ruby/2.7.0/gems/logging-2.3.0/lib/logging/diagnostic_context.rb:474:in `block in create_with_logging_context'

One of these looked like a rails-autoscale-sidekiq error which I didn't expect on a web dyno, could that be related?

@adamlogic
Copy link
Collaborator

Thanks @jnebeker & @carrabi!

I'm still not sure what's going on, or why I'm not seeing the same issues in our own apps using rails-autoscale-web in Puma cluster mode, but I'll keep digging.

One of these looked like a rails-autoscale-sidekiq error which I didn't expect on a web dyno, could that be related?

That's normal. Web dynos report both web and worker metrics, so that we still get worker metrics if workers are scaled down to zero. Sometimes Redis encounters some transient errors when fetching Sidekiq metrics, so nothing to be concerned about unless you're seeing LOTS of them.

@jnebeker
Copy link

That's normal. Web dynos report both web and worker metrics, so that we still get worker metrics if workers are scaled down to zero. Sometimes Redis encounters some transient errors when fetching Sidekiq metrics, so nothing to be concerned about unless you're seeing LOTS of them.

That makes sense! We're not seeing too many of those Redis errors, only a handful. 👍

@adamlogic
Copy link
Collaborator

I wanted to provide an update here, but unfortunately I don't have any progress to report. I'm not seeing this memory issue on the other Rails Autoscale customer apps I've looked at, and I haven't found any suspicious areas of the code.

I'll keep this issue open so others can provide more info if they experience the same thing. For anyone posting here with this issue, please include the following:

  • a screenshot of Heroku metrics for the past 3 days for web and worker dynos, indicating when you changed gems
  • similar screenshot for throughput (same screenshot is fine)
  • your WEB_CONCURRENCY and RAILS_MAX_THREADS settings
  • any other context that might be helpful

In the meantime, if you do see this issue, you are fine to continue using rails_autoscale_agent

@Gambitboy
Copy link

I've recently upgrade our app from rails 6.1 to rails 7.1 and ruby 2.7 to ruby 3.3, and I've been seeing very strange memory behaviour not present before the release. I actually had an issue where we had to rollback because I killed the memory, I had misconfigured the WEB_CONCURRENCY variable to a insane number through some weird auto detect code, during the rollback period the memory stabilised again and once we fixed and redeployed it start behaving weird again.

Here are my stats for the last 3 days.

You can ignore the H12 codes as those are due to a unstable 3rd party provider, I'm working on a different solution to move their dependency to workers instead of user requests due to them often spiking above 30 second response times.

But as you can see the memory usage jumps quite a bit, especially on the 7 day screenshot.

I've not yet migrated over to the new judoscale-rails gem, as I just discovered it now, I'm planning on doing this now.
I'm hoping this gives some more insight, I'm happy to collaborate to resolve this as I've been trying to figure out what could cause the memory issue, and this is the closes I've found.

My current WEB_CONCURRENCY is 1 and RAILS_MAX_THREADS is 5

memory
events
response-time
throughput
dyno-load

Here is my memory usage over the last 7 days

memory-7-days

@adamlogic
Copy link
Collaborator

Stairstep increases in memory like that are bloat, not a leak. In other words, you have an endpoint that's loading a bunch of objects into memory. I can't explain why the Rails/Ruby upgrades would have exacerbated that issue, but there's nothing here that a) shows a memory leak, or b) shows that it's related to Judoscale.

I'd recommend using an APM tool like Scout or Sentry to see if they provide insight into memory bloat.

@Gambitboy
Copy link

Okay that makes sense, thanks for the insight.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants