User:Andy/Idea/95th percentile monitor

From Strugglers
Jump to: navigation, search

A common charging method in the hosting business is known as 95th percentile. This means that regular measurements are taken throughout the billed period, then at the end of the period when it is time to calculate the bill, the top 5% are discarded. The highest usage measurement left is the one that is charged for. Therefore, if a service is described as "1Mbit/sec 95th percentile", you know that this means that 5% of your measurements may be above 1Mbit before any additional charges would apply.

The problem

What probably immediately comes to mind is that without some sort of control in place, it would be rather easy to use a lot of bandwidth and end up with a very large bill at the end of the month.

The first instinct might be to restrict the interface to the maximum bandwidth you wish to be charged for, e.g. 1Mbit/sec. This would work, but it leads to less performance than you might otherwise enjoy. Your network card is probably capable of 10 or 100Mbit/sec and it would be nice if it could be allowed to burst to that for very short periods of time - the 5% that you are allowed over and above.

A solution?

So why doesn't someone just write some scripts to fix this? Here's an idea off the top of my head:

  • poll the interface counters every minute
  • discard the top 4% or so
  • if the highest remaining reading is now above the limit then you have already used the first 4% of your free burst. Since you have only 1% remaining, the script could now lock your interface at 1Mbit/sec.

This would allow free use of your free burst without possiblity of incurring more charges than you want. Your ISP is probably polling every 5 minutes so if you did it every 1 minute and stopped after 4% is gone (not 5%), then you could be pretty confident of never going over.

A round-robin archive could be used to manage the storage of 30 days' worth of measurements without need to expire or rotate. Once the 96th percentile reading goes below the limit again, the rate limit on the interface could be removed. This should result in a link that is always at maximum on 96th percentile usage based on the previous 30 days.

Note, there are 43200 minutes in 30 days and 5% of that is 2160. 4% is 1728, so this method would allow 1728 excessive bandwidth measurements before shaping your interface down to 1Mbit/sec.

Is there a better algorithm?