Post-Mortem 2016-11-20 Network Config Changes¶
Authors: Steffen Gebert, Michael Stucki
During two occasions during the last days, service availability was impaired after changing
/etc/network/interfaces to add our VPN interfaces.
- Saturday, Nov 12th: backup server
- Friday, Nov 16th: physical host server ms06
service networking restart resulted in a permanent loss of connectivity. As we locked out of the running server externally triggered reboots were required.
Side note: We do not manually re-configure our servers, but use Chef instead. However, IP address configuration is not part of the Chef setup.
While a syntax error was at least partially the reason in one case, we nevertheless experienced the same connectivity issue when running
service networking restart with a correct configuration.
The syntactically correct config file was only accepted after the reboot.
Resolution and Recovery¶
In the first occurrence, we contacted the organization hosting our backup server, asking for a reboot. In the second case, we had means to execute a remote reset.
Corrective and Preventative Measures¶
All changes to the network configuration should be backed by an automatic revert procedure that would kick in, if not disabled by the operator who remains connected.
According to this issue,
service networking restart should not be used. Instead, use
( ifdown iface; ifup iface ) &
However, we are not certain about this, if this would be really sufficient in all cases.
The following procedure should be automatically triggered to prevent further failures, independent of the way to reset networking:
- After 1 minute: Revert the configuration file change and restart networking
- After 5 minutes: Reboot the server
The following gist can be used, assuming that a backup has been created in
curl https://gist.githubusercontent.com/StephenKing/83fedc56137f5640de929b4430f1b653/raw/24a7536bc074b575af55e667ccde0a4f3668fd21/reset.sh > reset.sh bash reset.sh