Scalability - measuring load

I've beeen doing some scalability, availability, and reliability testing recently using a web application along with a test harness that I've developed to help me understand real-life characteristics of these attributes. Some of the test results have been non-intuitive, at least for me. The one I want to discuss today is scalability - more specifically the predictability of a system's capacity to absorb specific loads.

If you can't predict a system's scalability in the face of increasing load, you'll struggle with choosing the right design for your system - whether that's design at a high level such as component separation and interfaces, or at a low level with algorithms and data structures. Inevitably you have to test and test again - it's very difficult to predict scalability by just looking at an application's design and code. And for testing, you need a good metric that will tell you about scaling characteristics.

A good way of expressing load (sometimes called traffic) is to use a dimensionless unit called erlangs. In the context of a web application this is the result of the multiplying the request arrival rate λ by the mean average response time h, assuming that both λ and h are expressed using the same units of time, eg milliseconds, seconds, or hours:

For example, a website that receives 150 requests per second and has an average response time of 40 milliseconds is experiencing a load of 150 * (40 / 1000) = 6.00 erlangs.

Let's assume that we find an ingenious way to reduce the average response time dramatically, from 40 to 10 milliseconds. Then the same load calculation gives us 150 * (10 / 1000) = 1.50 erlangs. Looks great - even if we increase the number of requests from 150 to 400 per second, we're still only at 4.00 erlangs, significantly less than our original load. All is good!

But when we do some load testing, we see that going faster means that we reach any red traffic lights faster:

In the original case of 150 requests per second with an average response of 40 ms, if the system experiences a temporary increase in response time of 10 ms then the resulting load will go from 6.00 to 7.50 erlangs.

In the new situation of 400 requests per second with an average response of 10 ms, the same blip in response time means the load goes from 4.00 to 8.00 erlangs! So we can handle many more requests, but are much more sensitive to bottlenecks in the system. In hindsight, this is obvious - the more traffic we handle, the more careful we need to be about glitches. It's similar to a motorway running at full capacity - it only takes a minor hiccup in the flow of traffic to create a traffic jam.

Erlang measurements are a useful tool to help engineers and technical architects understand load patterns in their networks. This is essential for a successful design of network resources and topology.

With that, have a happy New Year and I hope 2017 brings you success in reaching your goals.