I’ve been using Heroku for about 10 months now, mostly with node.js. Recently one of our apps was using more web dynos than we thought it needed, so I looked into analyzing performance. I ended up writing my first node packaged module (npm) to solve my problem.
We have one Heroku app that is our main http server, using node.js, express, jade, less, and connect-assets to serve primarily html pages. This talks to a second Heroku app that we call our API server. The API server is also a node.js http server, using mongo as its database, serving json in standard RESTful way. The API server is fast – when the html server is under such load to need 10 web dynos, the api server can easily keep up with only 1 or 2 web dynos. My gut was asking me, are 10 web dynos necessary? And even with 10 web dynos, there were times when requests would timeout or other errors would trigger. Maybe some error or timeout has a huge ripple effect, slowing down things.
So the problem is not to just figure out where time is spent on any one particular request, but where time is spent on average across all requests and all web dynos for the html server. How much time is spent while communicating with the api server? doing some internal processing? or something else in the heroku black box?
The first step was educating myself on existing tools that could help me. We already use New Relic, which is awesome and I highly recommend it to anyone who uses Heroku. At the time New Relic support for node.js was still in beta (still is as of this writing), and one the features supported in other languages (like ruby) is the ability to use a New Relic api to track custom metrics. I thought this would be a great way to track down how much time, on average, is spent in various sections of our code. Too bad it doesn’t work with node.js.
I considered other tools (like these) but the only one worth mentioning was nodetime. For us, nodetime was somewhat useful in that it offered details at levels outside of the application, such as stats on cpu, memory, OS, and http routing. This did not appear to solve the problem (I admit i didn’t read all their docs), but did provide some insight and some validation that things are setup as they should be based on documentation from Heroku and Amazon (Heroku runs on Amazon EC2).
In node + express environment, it simply tracks the start and end times of an express request, allowing additional triggers to be added in between. All triggers, start, and end times are logged when response is sent.
Where this shines is in the processing of these logs. By reading all logs generated over many hours or days (and from multiple sources), times can be summed up and total time spent between triggers can be known. This is useful when trying to ballpark where most of node app time is spent.
For example, if 2 triggers were added, one before and one after a chunk of code, the total time spent in that chunk of code can be known. And more importantly, that time as percent of the total time is known, so it is possible to know how much time is actually being spent by a chunk of code before it is optimized.
In conclusiion, jade rendering was the culprit. More specifically, compiling less to css in jade on each request was really time consuming (around 200ms per request, which is HUGE). To summarize the jade issue:
link(rel='stylesheet', href='.../bureau.css', type='text/css' )
Overall I am impressed with Heroku and am quite pleased by how easy it is to creatie, deploy, and monitor apps. Most of my apps have been in node, but also have run php, ruby, and python on apps as well. Heroku is not perfect, I would not recommend for serious enterprise solutions (details on that in another post). It’s great for startups or small businesses where getting stuff up and running fast and cheap is key.