Post-Mortem 2015-01-12-Gerrit-Database

Author: Steffen Gebert
Original URL:

Today from ca. 18:20-21:00h CEST Gerrit showed error 500s when
accessing changes because the MySQL service stopped crashed (repeatedly).
Prior to that, the server received a hard reset two times today due to
data center power outages.

Reason for the problem now was that the "changes" mysql table was
crashed and InnoDB halts the server once it notices checksum mismatches.

How to fix:

  • start server with innodb_force_recovery=1 so that the server is
    read-only but will not halt on read error
  • dump table
  • restart without recovery flag
  • drop original table
  • import dump

What also helped me to verify what table is crashed:

# for i in $(ls /var/lib/mysql/*/*.ibd); do innochecksum -v $i; done