Bug #61814
Updated by Patrick Broens about 10 years ago
When using the database backend in the caching framework, having a large website, will have the garbage collection scheduler task take forever. This is related to a few issues:
The scheduler task will call collectGarbage() in Typo3DatabaseBackend. In this method the first query will get the identifiers for records, f.i. cf_cache_hash, where the "expires" field is lower than the current timestamp. The expires field is indexed, but not in the right way. It is a multiple column index, cache_id (identifier,expires). MySQL cannot use the index to perform lookups if the columns do not form a leftmost prefix of the index. Since the SELECT statement only has the "expires" field in the WHERE, the index is not used. When having a large amount of records in CF tables, like cf_cache_hash, this will take a long time.
The query above will get the identifiers for all records found. In a huge installation this can be a large amount of identifiers. These are all put in an IN() clause to delete tag rows connected to expired cache entries. In this case the InnoDB buffer pool can be too small or it's possible you do not have enough physical memory to hold your working set. This can be avoided by using a multiple DELETE by using joins.
The solution would be to optimize the method collectGarbage(), combining the queries into one multi table DELETE statement using joins and redefining the keys for the caching tables. Furthermore performance tests should be done with large amounts of records in the caching tables.
In my case the site is holding over 100.000 pages, having a default caching time of 2 hours. A side effect of the above is that the cache is filled faster than the garbage collection task can delete, getting an ever growing database.