Post-Mortem 2016-12-22-Gerrit-SSH-Hang

Authors: Steffen Gebert

Issue Summary

Connections to Gerrit SSH (port 2941) were hanging (~every third try one).

Timeline

- Wed, 17:26: Stephan GroƟberndt reports the problem to the #typo3-server-team channel
- Thu, 6:59 Starting to investigate this issue
- Thu, 7:45 Gerrit restart resolved the problem

Root Cause

Gerrit did not respond to the client's SSH connection initialization:

The exact reason remains still unknown. There are a couple of threads in the Internet about hanging Gerrit SSH connections. Most of them report database.poolLimit as a possibly limiting factor. We have set them to 36, this "should be sufficient".

Resolution and Recovery

systemctl restart gerrit resolved the problem.

Corrective and Preventative Measures

The next time this issue occurs, the following should be checked:

- open tasks: ssh review.typo3.org -p 29418 gerrit show-queue -w
- open connections: ssh review.typo3.org -p 29418 gerrit show-connections
- maybe change log level using ssh review.typo3.org -p 29418 gerrit logging. SSH log file is in /var/gerrit/review/logs/sshd_log.
- thread dump: jstack <pid> as gerrit user (you can get the pid from systemctl status gerrit)

Then post this information to the gerrit mailing list.

wireshark.png View (95.4 KB) Steffen Gebert, 2016-12-22 08:39