Monday, January 8, 2018

Modeling Performance of API Servers

Most recently I've been looking at modeling the performance of my API Server.  There are lots of good reasons to have a model for your performance.  Some of them have to do with being able to predict behavior, and others with being able to see where things are going wrong.

Response curves don't follow the usually normal distribution.  For one, all response times must be positive.  Common models used for response times include Log-Normal and Erlang.  More recently I've also been investigating the use log-logistic distribution.  You might recall my previous discussions of the logistic curve, and as it turns out, it has other uses.

Here are images of the three distributions (all from Wikipedia) in the order referenced:
Log Normal

Erlang
Log-Logistic

Each of these distributions has two parameters governing shape and position of the central hump. Log-Normal just has a useful shape.  Erlang models a sequence of identical processes, which closely approximates what a computer is doing. Andrew Charneski wrote a great blog post on estimating parameters for the Erlang and several other distributions, specifically as related to network latency, so I thought I would start there. After struggling with the estimating process, I took a look at other distributions and found log-logistic to be more computationally tractable in Excel (I don't use R, and don't have Mathematica or other numeric analysis tools readily available, or general competence in them). I think it also does a slightly better job for my uses, and that is backed up to some degree (see the Networking section in the WikiPedia article).  Dr. Mohan Pant wrote a great article on estimating the parameters of the log-logistic curve using what he calls the Method of Percentiles which I used to validate the estimation formulas I somewhat independently rediscovered on my own.  I was quite honestly glad to find it because I wasn't quite sure I could get away with estimating my parameters as easily as I did.  The math is a bit dense, but between it and the WikiPedia articles (and perhaps some additional Googling), someone with a decent math background can work it out.

You might ask what having model does for me, and that's a fair question.  One of the key benefits of having the model is that I can use it to compare two sets of performance results.  Estimating model parameters from performance results allows me to make some comparisons about expected application behavior that just isn't feasible with the raw results.  It also allows me to compare the performance to two implementations much better.

For example, presume that I have two choices about how to implement a particular resource.  Would you rather than I give you a somewhat lower average response time with a longer thicker tail, or would you prefer to have a lower 90% response time.

Here's a made up example of what I mean.  Consider the following two performance curves.
The red line on the left, or the green line to the right?  Both look like the do about the same amount of work under the large part of the curve before the tail, but the red line clearly completes most of its work sooner, so clearly it is the better one, right?  If you  picked the red line, you missed the clues in the line colors.  The modal response time of the green curve is clearly worse than the red line, BUT, it has a smaller tail.  

Overall, the average response time of the green curve is about 6.6 time units, whereas the average response time of the red curve is about 6.2 time units.  The green line requires less computer time (for both of us) over the long haul.  If you look at the dashed lines on the graph, they show the cumulative completion rate at a particular time.  By the time the red algorithm has completed 90% of its requests (14 time units), the green line has completed 97.5% of its requests.  The red algorithm doesn't get that far until until about 38 time units in.

Why do these curves have such long tails? Welcome to the real world.  A small number of patients have mountains of data to plow through, whereas most of the rest don't. That's part of it.  In a computer network, also, the long something takes, the more likely something else is going to cause a delay. As an API consumer, you are at the end of a long stick through which multiple computer systems play a role. Thread switching, process swapping, disk access, and network resources are constantly under contention.

     Keith






0 comments:

Post a Comment