Winding down from an adrenaline high after spending the better part of 50 hours troubleshooting our ISPs network for them...
The problems - technical perspective:
- Routing decisions on the internet have a financial component - often, the lowest cost route is chosen (that's lowest cost to the carrier, not you!)
- Internet routing is brain-dead when it comes to performance; the quality of the route is not a significant variable in routing decisions, so long as the route is 'alive' (or still warm...)
- There are no ubiquitous quality-of-service standards - especially across carriers.
- The internet as a whole lacks diagnostic (and self-correcting) capabilities.
The problems - business perspective:
- The internet is becoming an increasing critical service to businesses worldwide, yet little is invested in addressing quality concerns;
- Far too many people - including ISPs (big and small) - have a laissez faire attitude to internet problems. The phrase "it's just the internet" allows ISPs to delivery shoddy service - imagine if your pilot said "it's just the engine". And ISPs aren't the only ones to blame - it saddens me that most people take this poor service for granted.
- The internal culture of ISPs reflects the above point - I was actually told yesterday, by a major backbone ISP, that "that's just the way the internet is". After inquiring as the escalation status of the issue, I was informed that the VP of Operations is aware of the problem. When I asked "what is he doing about it?", the response was less than encouraging - "we have to trust in our employees evaluation of the situation" - what kind of escalation is this? How does this help the problem get resolved?
- ISPs internal support protocols are flawed; they are self-serving, missing the target of resolving the problem. The image of "trained monkeys" came to mind during this exercise - they can do A, B, C, D - and if you don't gift wrap it and hand it to them on a silver platter (i.e. solve the problem for them) they can't deal with it. What happened to "taking a problem and running with it"? Perhaps if there was more emphasis on quality there would be fewer support incidents, reducing the number of support staff and allowing for a higher calibre of support (a viscious cycle...).
So what is the solution? Well, you could have multiple ISPs to mitigate the problem - but this seems to me to be spending money to solve a problem that shouldn't exist in the first place. Unfortunately, if you require a certain level of quality you are forced into this.
No, the real solution is to ensure your ISP feels the pain of poor quality every time it occurs. If you have a capable ISP, they will be an advocate for you with downstream problems.
Make sure that you:
- Have a well defined escalation procedure, preferably worked out as part of the initial agreement;
- Have an SLA in place to cover the oft-neglefted dimensions of performance and incident response time, in addition to availability.
- Work with other customers of the ISP - there is power in numbers;
- Don't by shy about keeping quality of service front and center - it should be always be the primary goal.
- Stay away from shoddy service providers that claim to be Tier 1; they may have backbones, but can they keep them up and running?
Additionally, it would be nice to see the internet quality of service regulated - perhaps to the point where the Internet is classified as an "essential service". The standards and enforcement associated with this will weed out the (far too many) deadbeat ISPs.