I found reap, “A tool for parsing #Ruby heap dumps by analyzing the reference graph," and was able to generate some flame graphs from that. Here's a graph of a #Puma worker that was leaking memory. Only 2-3 of 12 on the machine were leaking at the time. The others looked “normal."
Now to figure out what a "normal" flame graph looks like for our #Rails app. And why the heck that Thread, and subsequent ActiveSupport::Notification::Event are holding 1.75GiB of memory.
The flame graph revealed a thread (:rimshot:) to tug on. It turns out, ActiveSupport::Notifications.subscribe blocks can leak AS::N::Event objects in Rails < 7.1 if an exception is raised while processing the Event. I spotted it in one of our flame graphs from a leaking Puma worker in production. I then used sheap to interrogate the leaked Event, and figure out what sort of initial request was causing it to leak. The fix had coincidentally been fixed in a dependency. (1/2)
The error-raising dependency problem was a recent change that caused invalid URLs to raise an error while trying to “clean" them. https://github.com/bugsnag/bugsnag-ruby/pull/811
Up in #Rails-land, the underlying place where the Event objects were being leaked was fully removed in Rails 7.1. So, yay! And thanks @jhawthorn! https://github.com/rails/rails/pull/43390
Phew! What a trip.
@stevenharman oh man, that reminds me of a time that I tried to use AS::N for a generic pub/sub mechanism within an app.
One thing I learned is that exceptions were silently swallowed, and that overall it was just a Bad Idea. I unrolled that design decision in fairly short order.
The behavior may be different today, but I will stake the position that it’s not a good idea to use it like that. It’s not what it’s for, even though it can look like it when you squint a little.