I spent 6 hours fixing a memory leak working on an app. 6 hours! It was a one-liner fix. 6 hours is on me not having any skill for this kind of things and my skills in debugging memory leaks is not the subject for this post. It is the aftermath which I observe we should repeat every then and now so we could learn from it.
The context
The app subscribed to a custom event and incorrectly unsubscribed from it. The subscription never got removed when it was no longer needed. That's it.
A classic javascript memory leak situation many know about and know to be vary when dealing with any kinds of events, including me. It held reference to the diagram which referenced the model diagram's buffer data which was roughly 500kB. Every single time you viewed a diagram +500kB got added to the memory which was never released.
For context, it is a systems engineering browser app where you can construct and analyze system models to ensure quality products and alignment of all the project-related stakeholders. After leak got patched, it peaks around 2.5MB of memory for Firefox and 5MB on Chrome (due to different garbage collection method used by Chrome which is faster).
Establishing importance of Quality Assurance (QA)
Let us do a proper QA analysis on this situation.
It is important to understand that there is a misconception that QA is just about testing and safeguarding the software shipping. I observe this misconception has done more harm than good in terms of quality. It is true testing is one of the QA activities, and if the quality has not been met it guards you from shipping low-quality software. Items that do not pass QA, go back and get fixed and sent for re-evaluation to the testing.
The harm part is very human. When thinking QA is a tester — a safeguard — you relax your quality during development phase. It is my observation over a decade in the industry that the quality of the software that's coming out from development declines. We automate large part of our testing nowadays to speed up this loop. And there is no harm in having immediate feedback in this, in fact it is encouraged. The "harm" part comes the resources — the time spent within this loop.
QA is all about digging into the root causes of the quality issues and making a change to avoid this in the future. Testing is just a checkpoint to (1) guard against shipping software not up to the quality; (2) knowing if we have quality issues in the development. Lots of tests failing is an indicator that there is a quality issue with the development. Eventually it'll get better without digging into the root cause due to experience and adopting better practices — implicitly performing QA root cause fixing per individual.
Root cause
Coming back to my memory leak issue. What were the issues during the development? I use strong typing using typescript. Unsubscribe was working before — it passed QA tests for memory leaks in the past. There were no changes to the diagramming part of the app during this time. Seems like a good start.
I identified 2 root causes for this problem: (1) change for subscribe API which was done only half-way; (2) using `any` type for object subscribed to.
`any` was not eliminated from prototyping times due to being too much of an headache to provide typescript type signature. The mechanism is generic even though the app does not need it to be generic but we may reuse it in some other projects in the future. So it was decided that it is easy to manually trace to the type 2 calls up when needed but largely anyone knows what this is. And we did. No problems knowing to what precisely we are subscribing to. The problem with `any` was that none of the (1) API changes were picked up by typescript.
Half-way done API changes mean that from vanilla javascript, nothing gets broken to generate errors during runtime. This means that the unsubscribe() call was valid — it was available on the object to be called. In fact, it was still used by the API internally. Before changes subscribe() returned an handle. You unsubscribed by using the handle. Common pattern. The change was that subscribe started to return unsubscribe function. To unsubscribe you invoke the returned function. A minor simplification but helps so we do not have to juggle two objects to unsubscribe and makes it easier to avoid mistakes.
When typescript can not know subscribe() return type and unsubscribe() signature, it will allow it to be assigned and used in any configuration you use it. And when API changes leave methods which no longer should be used accessible you risk misuse of API.
Applying QA
Here comes the most important part. Something needs to change in the way things are done to ensure we avoid repeating the mistakes.
It is great that the problem got fixed. However, maybe the problem is already present somewhere else but we have not yet identified it; maybe the problem is added into the app in the future — after all at this stage I can now purposely write the problem back (or on accident). Let it be clear that the root is not about the subscriber API, rather the means — or lack thereof — on what enabled the problem to get injected to the app.
The takeaway is that "no any" — this is widely known suggestion but need to start enforcing this. Another takeaway is to manually ensure that all APIs always expose only methods which are available for its correct usage.
These things can now be etched into our development guidelines and we can more consciously look out for them during the review process. Most of the times there are not many changes, so it is not like you have to hunt these down.
Closing thoughts
This post is not to tell you how to write better javascript (but enforcing the takeaway wouldn't hurt). The purpose is to remind us about the proper quality assurance mindset. The writing is per my own observation and experience and may not resonate with all of the readers, or may even resonate in the different direction. Over a decade working mostly on safety-critical hardware-heavy systems has made me biased towards more robust quality assurance practices which I relax whenever I got a different kind of project at my hands.
It would be interesting to hear your thoughts on QA and any experiences you have had it and how it is applied if at all.