Post-Mortem 2016-12-01-SOLR-Search-typo3org

Authors: Steffen Gebert, Michael Stucki

Issue Summary

The SOLR search (including the TER listing) on typo3.org was unavailable, caused by a faulty ACL being deployed.

Timeline

- 13:48 Jochen Roth mentioned in slack that the TER is unavailable
- 14:37 Michael Stucki noticed this comment and commented in the #typo3-server-team channel
- 14:51 Steffen Gebert started to investigate this issue
- 14:59 Corrective fix being deployed
- 15:00 Search is available again

Root Cause

- caused by recent cookbook upload, a new and breaking version of the ohai chef cookbook (>4.0) was deployed; this changes the way, how the path to plugins is configured
- as a result, our plugin to fix the IPv6 address detection in OpenVZ was not applied anymore
- as a result, the solr cookbook became unaware of the ip6address of the typo3.org server (by the means of Chef search)
- as a result, the solr cookbook deployed the ACLs for tomcat excluding the IPv6 address of the typo3.org server
- the typo3.org server was unable to contact the SOLR server anymore

Resolution and Recovery

A change to the chef environment now enforces the use of the up-to-date version of the Chef cookbook t3-openvz

Corrective and Preventative Measures

- monitoring checks were updated to catch this error. While we were monitoring both the search function as well as the TER listing, the search strings seem to be still included in case of errors, resulting in the defect being not detected
- we are about to upgrade our infrastructure to the newer ohai cookbook while still maintaining compatibility with our plugins.
- we will look into Chef's Policyfile feature, if this helps us with both, becoming more confident about what cookbook versions are in use as well as being able to update platform cookbooks without touching every application stack.