Requests Per Second

Posted by Koz Tuesday, January 06, 2009 07:12:00 GMT

One of the most harmful things about people discussing the performance of web applications is the key metric that we use. Requests per second seems like the obvious metric to use, but it has several insidious characteristics that poison the discourse and lead developers down ever deeper rabbit holes chasing irrelevant gains. The metric prevents us from doing A/B comparisons, or discussing potential improvements without doing some mental arithmetic which appears beyond the capabilities of most of us.

Instead of talking about requests per second, we should always be focussed on the duration of a given request. It’s what our users notice, and it’s the only thing which gives us a notice.

I should prefix the remaining discussion here by saying that most of it does not apply to discussing performance problems at the scale of facebook, google or yahoo. The thing is, statistically speaking, none of you are building applications that will operate at that scale. Sorry if I’m the one who broke this to you, but you’re not building the next google :).

I should also state that requests per second is a really interesting metric when considering the throughput of a whole cluster. But throughput isn’t performance.

Diminshing marginal returns

The biggest problem I have with requests per second is the fact that developers seem incapable of knowing when to stop optimising their applications. As the requests per second get higher and higher, the improvements become less and less relevant. This lets us think we’ve defeated that pareto guy, while we waste ever-larger amounts of our employers’ time.

Let’s take two different performance improvements and compare them using both duration and req/s.

Patch Before After Improvement
A 120 req/s 300 req/s 180 req/s
B 3000 req/s 4000 req/s 1000 req/s

As you can see, when you use req/s as your metric, change B seems like a MUCH bigger saving. It improves performance by 1000 requests a second instead of that measly 180, give that guy a raise! But let’s see what happens when we switch to using durations:

Patch Before After Improvement
A 8.33 ms 3.33 ms 5 ms
B 0.33 ms 0.25 ms 0.08 ms

You see that the actual changes in duration in B is vanishingly tiny. 8% of one millisecond! Odds are that that improvement will vanish into statistical noise when compared to the latency of your network, or your user’s internet connection.

But when we use requests per second, that 1000 is so big and enticing that developers will do almost anything to get it. If they used durations as their metric, they’d probably have spent that time implementing a neat new feature, or responding to customer feedback.

Deltas become meaningless

A special case of my first complaint is that with requests per second the deltas aren’t meaningful without knowing the start and the finish points. As I showed above, a 1000 req/s change could be a tiny change, but it could also be an amazing performance coup. Take this next example:

Before After Diff
1 req/s 1001 req/s 1000 req/s

When expressed as durations you can see that it made a huge difference

Before After Diff
1000 ms 0.99 ms 999.01 ms

So 1000 requests per second could either be irrelevant, or fantastic. Durations don’t have this problem at all. 0.02ms is obviously questionable, and 999.01 ms is an obvious improvement.

This problem most commonly expresses itself when people say “that changeset took 50 requests per second off my application”. Without the before and after numbers, we can’t tell if that’s a big deal, or if the guy needs to take a deep breath and get back to work.

The numbers don’t add up

Finally, requests per second don’t lend themselves nicely to arithmetic, and make developers make silly decisions. The most common case I see this is when comparing web servers to put in front of their rails applications. The reasoning goes something like this:

Nginx does over 9000 requests per second, and apache only does 6000 requests per second!! I’d better use nginx unless I want to pay a 3000 requests per second tax.

When people do this comparison they seem to believe that by switching to nginx from apache their application will go from 100 req/s to 3100 req/s. As always, durations tell us a different story.

Apache Nginx Diff
6000 req/s 9000 req/s 3000 req/s
0.16 ms 0.11 ms 0.05 ms

So we can see that odds are you’ll only gain a 5% of a millisecond’s improvement when switching. Perhaps that improvement is worthwhile for your application, but is it worth the additional complexity?

Conclusion

Durations are a much more useful, and more honest, metric when comparing performance changes in your applications. Requests per second is too wide-spread for us to stop using it entirely, but please don’t use it when talking about performance of your web applications or libraries.

Comments

Leave a response

  1. josh susserJanuary 06, 2009 @ 07:52 AM

    Nicely put. But you have to remember that numbers like these aren’t meaningful without context. Yes, user perception of performance is what we as developers want to focus on. But the guys who provide the hardware to run our software have to do a little math to know how many servers or VPN slices to order. Yes, at the scale most of us develop on it rarely makes a significant difference, but for some businesses it can be a major factor. The difference between 100 req/sec and 110 req/sec for a node of an app that must process 1000 req/sec could save the company the cost of an entire node. Scale that up to Google’s size, and eeking out a percent in performance can potentially save the amount of someone’s annual salary from your department. And now that I’ve set up that straw-man, I’ll knock it down by saying that’s the wrong thing for nearly all of us to be worrying about. At that point you’d probably be better off trying to standardize your power outlet plugs (or so I’ve heard).

  2. Andrew TimberlakeJanuary 06, 2009 @ 08:14 AM

    Great article. The before/after context is extremely important.

    I notice it far more easily when used outside of a programming context like when my wife comes home from shopping and says she paid x less for y, “what a bargain” – I always tell her that is meaningless unless she tells me how much it was before. Did she save 5% or 50% – You only know with context!

  3. KozJanuary 06, 2009 @ 08:23 AM

    @Josh: Indeed, if we’re talking about reducing hardware expenses then the throughput of a cluster is very interesting indeed. However, in my experience, hardware costs are a fraction of development costs for almost all of my customers. (cue smart joke about koz being expensive).

    Additionally, hardware costs aren’t continuous. So your 10% improvement in throughput is probably not saving you 10% on hardware. You’re either at the edge of your current capacity, in which case the savings are the cost of an additional server (considerable) or you’ve got spare capacity, in which case the savings are 0.

  4. SailorJanuary 06, 2009 @ 10:16 AM

    And what about applications which do have to scale ? Is it impossible in Rails ?

  5. johnJanuary 06, 2009 @ 12:25 PM

    I quote sailor, is it possible for rails to scale for big applications ? Some hints about this? Do you suggest a good way to check the request per second/durations ? Or the only way is to check the production log?

  6. Dan KubbJanuary 06, 2009 @ 12:47 PM

    @Koz: It’s great that you bring this up, and it’s also interesting how so many people focus first on raw app performance rather than on optimizing for the end user experience, which IMHO is much more important for most apps.

    Yahoo! has been studying end user performance for a while and discovered that on their site only 5% of the time the user is waiting for the HTML page (the dynamic stuff we tend to focus on generating so quickly) and 95% of the time waiting for other components to download, render and execute in the case of JavaScript.

    If anyone is interested in this, I’d probably start by installing YSlow in Firefox and measuring your site and see where the end user spends their time waiting most. YSlow has a bunch of articles and decent docs, but if you want more info pick up Steve Souders’ book called “High Performance Web Sites”, which goes more in depth on the YSlow measurements used.

  7. DHHJanuary 06, 2009 @ 05:20 PM

    Sailor and John, I can’t tell if you’re serious or not. But just in case you are, see http://rubyonrails.org/applications. There are tons of really big sites running a high scale operation on Rails.

  8. Paul CampbellJanuary 06, 2009 @ 05:46 PM

    Neat point. I have to say I’ve definitely been guilty of the “I have so much power if I need it” line of thought.

    This post pairs nicely with DHH’s “good enough” post on SvN today. Often the “good is not good enough” argument causes blindness, and the pursuit of the “best” or the “highest” blurs out the notion of good enough or high enough.

  9. DevlinJanuary 06, 2009 @ 06:12 PM

    I think you bring up a lot of valid points, some common pitfalls in measuring performance.

    You can perform A/B testing using requests/second as the metric. A/B testing will show if there is a difference between two implementations and whether that difference is statistically significant. Determining if this difference is actually significant depends on a much broader context of the application.

    Condemning this particular metric is missing the mark. Your essay cleanly demonstrates that naive analysis leads to naive conclusions—the example metric is a straw man.

    What I get from the essay is not that requests/sec is evil, but that analysis should exhibit clear thinking and healthy doses of Amdahl’s Law.

  10. Bill KayserJanuary 06, 2009 @ 06:34 PM

    Throughput by itself is not a sole measure of performance, but I don’t agree that response time as a single measure represents performance either, unless you are talking about the specific aspect of a systems performance, the end user experience.

    Even when analyzing a system design with the primary objective of reducing latency, you can’t ignore throughput. For instance, you might optimize the client side performance with tools like YSlow, etc., but you might still notice that at a peak period or burst the response times spike dramatically because you reached the maximum throughput on your server. The only way to avoid those spikes in response time is to leave enough headroom on your servers to allow for the additional capacity, and the only way to do that is to measure the capacity in terms of requests per second.

    While you can’t evaluate in isolation any one of throughput, utilization, or response time, you can take a pretty good crack at estimating capacity empirically by generating an increasing load and measuring the maximum capacity. The maximum throughput will tell you what the load is that will saturate your system, which is about 10-20% past the point where your response times will go from flat to increasing linearly with load.

    Again, I’m not disagreeing with the idea that you can’t represent the end-user experience with throughput. But if you are a performance analyst concerned about a system architecture which provides the best user experience all of the time, you can’t ignore it either. And if you are a performance analyst concerned about capacity planning, scalability, and price/performance, then without looking at all the performance parameters you are flying blind.

  11. CoreyJanuary 06, 2009 @ 06:36 PM

    Nice post, but I think any performance engineer with a bit of experience knows to investigate both throughput and response times.

  12. KozJanuary 06, 2009 @ 07:35 PM

    I certainly don’t think that there’s no merit at all to throughput as a measure, or that it should never ever ever be used. All I’m trying to suggest is that it shouldn’t be the first tool you take out of the tool box.

    With some folks, it’s the ONLY tool they take out of the toolbox, and they make stupid mistakes as a result.

    Remember that for a given request the requests per second number is the inverse of the duration. So you’re measuring the same stuff, just doing it with a metric that doesn’t mislead you.

  13. Ryan ShriverJanuary 06, 2009 @ 09:27 PM

    Good post and agree that response time is more important than throughput. In my blog post here, I discuss the Top 3 performance metrics for web systems as:

    1. Availability 2. Response Time 3. Throughput

    In this order. But please note that it’s useless to discuss Response Times UNLESS you are also discussing the throughput of the system during the response time test. The response time for 1 user doing a function vs. the average number of users doing a function vs. the peak number of users doing a function will likely give you 3 different response times. So just getting the response time alone isn’t enough, you need to ensure the system is performing under a realistic load scenario to ensure the response times are applicable to your situation.

    -ryan

  14. Jason WatkinsJanuary 06, 2009 @ 10:00 PM

    For end user performance definitely front end work matters most. Typically we shouldn’t optimize page rendering until we’ve fixed that, or if we’re running out of server resources.

    I think it’s best to establish a threshold latency such as 200ms, then in your log analysis tally up the sum of all time above the threshold for each page in the application. This hit list tells you exactly where to go to make the biggest impact on response times above your threshold. If all you care about is utilization, then just target by total service time per page.

    Also, Devlin nails it. Everyone should be familiar with Ahmdal’s law. I’d also say that anyone doing performance analysis should know the basics of M/M/1 queues as well. Knowing that, you’d know that throughput is not 1/response time, and that what you’re measuring is actually service time, not response time.

    That said, the approximation is accurate enough for many situations. But it can keep you from insights about your system.

    For example: if load is the ratio of arrival rate to service rate, then the queue depth will be load / (1-load). A 50% utilized service will have a queue depth of one. A 95% utilized service will have a queue depth of around 64. Response time of the latter will be far greater. Knowing this helps you understand why if you have a service time distribution with a bumped tail of very slow servicing, it has such a disproportionate impact on the perceived response time, and monstrously so if you’re approaching your peak capacity.

  15. Jesse AndrewsJanuary 06, 2009 @ 11:03 PM

    Regarding Nginx, I think we have tended to use the performance comparison as another argument for choosing it over apache.

    Pre-Passenger, if you hadn’t been using apache for years, getting apache tuned was much more complicated than getting nginx setup (the enormity of /etc/httpd/* and non-standard locations in different distros, ...).

    The advantage for Nginx for me had always been faster/easier setup and “better response time than apache for static content”.

  16. W. Andrew Loe IIIJanuary 07, 2009 @ 05:08 AM

    @Paul Campbell:

    “Perfection is the enemy of good.”

  17. johnJanuary 07, 2009 @ 11:57 AM

    @DHH: i know that applications, i also use some of them. But it would be nice to know which hardware structure they use

  18. ActsAsFlinnJanuary 08, 2009 @ 03:34 AM

    reqs/s is a pretty darn good indicator of inefficient code, db indexes, latency, etc. You can get quite a bit of traffic without being google when reqs/s starts to matters.

  19. Ben BeeJanuary 08, 2009 @ 09:03 AM

    I get the impression that whenever RoR doesn’t come off as #1 by some standard, the inner circle will just dispute the standard rather than be open about shortcomings. Not the most trust-inducing thing to do.

  20. OracleJanuary 08, 2009 @ 10:56 PM

    @DHH: you can scale anything horizontally to any extent. I’m much more interested(in this day and age of green, low power, datacenters) in ultra-efficient vertical scaling.

  21. paulJanuary 09, 2009 @ 12:15 PM

    @Ben Bee, the post doesn’t have anything to do with rails.. it was just the example the author is using.. you should learn to read..

  22. SebastianJanuary 10, 2009 @ 11:56 AM

    Request per second is ofter measure for admins. They like to tell that serwer is up to 1000reg/s

  23. Brian TakitaJanuary 11, 2009 @ 04:20 AM

    I think you make some excellent points. You point out that relative (percentage) gains should not be ignored, just like actual gains (reqs/sec or time per request). Like Ryan says, you have to consider 1. Availability 2. Response Time 3. Throughput. I’d also add operational burden.

    Response time is important for the end user experience. People will leave if pages take too long to load, after all. However, response time and throughput don’t always go hand-in-hand. Your throughput can be good on a single node, but if the nodes are not horizontally scalable, you are limited to how many users you can handle. I’m not saying that this should be a major obsession, but a point of consideration.

    Be careful about the whole hardware vs. developer cost argument. Its not just hardware costs, but overall operations costs, which includes developers developing infrastructure to manage the additional servers, scaling the database (master/slave replication), fixing servers going down, etc. Servers are also not cheap when you need to scale.

    So, yes you do have a valid point about people ignoring important factors when thinking about performance, but at the same time, don’t ignore the factors are commonly paid attention to, for good reason.

  24. Willem van BergenJanuary 12, 2009 @ 06:12 PM

    I am one of the authors of request-log-analyzer, a performance analyzer for Rails applications (see http://github.com/wvanbergen/request-log-analyzer for more info). I too agree on your point about requests per second.

    I would use two metrics to determine what optimizations you need: average time per request, because this is what your visitors experience. The other metric is cumulative request time. Cumulative request time is the sum of all the time your server spends on a set of requests. This is what your server “experiences”: the load on your server.

    For a Rails application, you would normally group requests by their controller/action, and calculate those two metrics for these groups. (Shameless promotion: this is exactly what request-log-analyzer does.) Determine what actions you should optimize with these information. If you want to improve the visitor experience, focus on actions with a long average request time. If you want to improve the load on your server, focus on actions that have a high cumulative request time.

  25. Rich CollinsJanuary 15, 2009 @ 07:08 AM

    This completely ignores the complexity introduced when you go from having to manage one machine to two, then three … etc. If you want to stay lean, having an efficient server can make a big difference.

  26. William LouthJanuary 15, 2009 @ 05:44 PM

    Are you (all) stating that the duration (response time?) is the inverse of the requests per second? Do Rails apps only have a single request processing thread?

  27. Vesa NieminenJanuary 17, 2009 @ 08:00 PM

    Thanks! Good article.

    All of us are not performance engineers and pointing out stuff like this is valuable to junior developers like me.

  28. AvishekFebruary 06, 2009 @ 07:17 AM

    Very useful article. I agree with your points.

    For my application I use a mix of requests/sec and response time. First I make sure that my application is able to handle X req/sec on certain hardware, then I measure that at a constant load of X req/sec, what is the response time. Then work to improve the response times. Then increase the load to X+d req/sec and measure the response time and so on.

    Since these two are related, if you improve one the other also improves (usually not by the same amount).

  29. Peter BoothFebruary 16, 2009 @ 07:15 AM

    A couple of people have replied to this thread along the lines “well that’s obvious” or have agreed, added a personal spin, but we have really ignored some pretty colossal “elephants in the room” : Most programmers have a muddled idea of what performance is, how it differs from capacity or why there may well be no relationship between response time and peak throughput. The vast majority of software has performance that sucks. Most of the advice that we can read about performance is foolish, well-intentioned nonsense.

    Scalability is irrelevant to most websites because they will never get the usage. When someone raises scalability in a technical discussion it’s commonly used as a FUD bear to try and secure an agreement

    e

  30. Peter BoothFebruary 19, 2009 @ 12:49 PM

    [continued] ... When someone raises scalability in a technical discussion it’s commonly used as a FUD tactic to try and secure an agreement on a technical approach.

    Distinguishing throughput from performance is important, and if people are fuzzy thinkers they can conflate the two. But I’d question the assertion that developers don’t know when to stop optimizing. Much more common is developers not knowing how to optimize. Today we have broadband networks, high end multicore PCs, lots of RAM. Yet how often do you see a snappy website?

  31. Just MeApril 06, 2009 @ 09:09 PM

    Performance/load/scalability/etc testing is all about context. Context, context, context.

    RPS = Load Response Time = Performance

    True performance metrics should always be represented as ‘x response under y load across z time”. If you remove ANY of these three, you have just wasted your time and have provided very little true benefit.

    Anyone who tells you you do not need load to effectively profile performance is either ignorant to the fact or is a liar. The load provides the context. To say “the average page load time of our web app is < 1 second” means nothing if you do not provide the load at which the server[s] are under when providing this response time. Is the server responding to less than 1 request per second or is the server responding to 50 requests per second and is this load sustained?

    I think it is easy to see that load is as important (if not more important) than the response time numbers by themselves.

  32. Just MeApril 07, 2009 @ 04:30 PM

    Hmm, seems I upset the author. Why was my comment removed? I said nothing to offend anyone. Was it too much truth for you?

  33. KozApril 07, 2009 @ 10:17 PM

    @Just Me: I’ve not deleted a comment here since the relaunch, if your comment doesn’t show up, it triggered the akismet spam checks.

    However it sounds like you’re trying to troll, and please just go elsewhere. Reddit probably needs some page views and will welcome your contributions.

  34. Conner - Hoteis Em BarcelonaMay 25, 2009 @ 10:13 AM

    Oh very good and sophisticated article. Thanks a lot, that approach worked a lot better and solved my problems. You saved tons of my time. Feel like I’m getting closer to Rails! Thanks

Comment