Thursday, October 25, 2018

Resolving Notworking Woes

Networking woes are the bane of any interface engineer, service representative, or IT help desk person.  Let's talk about the various ways our network connections go down:

  1. We screw something up in the URL we are requesting and get back an error we didn't expect.
  2. Someone fat fingers the hostname or port address somewhere and we cannot find the network endpoint.
  3. The hostname doesn't match the certificate associated with it.
  4. The certificate has expired.
  5. We don't like any of the certificates that have been offered because we don't trust one of the root CAs.
  6. The host is down.
  7. The website on that host is down (e.g., the host can be reached, but the port isn't being listened to).
  8. The proxy server is down.
  9. The system isn't configured with the correct proxy server.
  10. We cannot resolve the IP address of the proxy server.
  11. DNS used to resolve the proxy server address is down.
  12. DNS used to resolve the server hostname is down.
  13. We should/shouldn't be using a proxy.
  14. We haven't entered the proper credentials to authenticate to the proxy.
  15. The firewall doesn't like our URL for some reason (someone once reported to me that since a  URL contained the word "sex" it was rejected by an overly sensitive firewall).
  16. The VPN is down.
  17. DNS registration expired.
I could go on for quite a bit longer.  
The set of diagnostic tools we have is vast: ping, tracert, nslookup, ipconfig, wireshark, openssl, ... (I could go on with many more) but most of these are run from the command line, have lots of options and require human interpretation.  

DNS, Proxy, host, web server, firewall.  By the time  you have everything correct, at least 10 things have to be working.  If every one of them is running at 5 nines, you are now down to 4.  If you have 1000 customers, you now have about a 1 in 10 chance that for one of them, something is wrong in the notwork [sic].  

Why do we do this to ourselves?  Wouldn't it be good if our platform software could tell us IN DETAIL exactly what is wrong when something doesn't work the way we expect it to?  Wouldn't it be even better if the platform told you what could be done to fix it?  Wouldn't it be absolutely awesome if the platform could actually take its own advice and fix the problem?



Post a Comment