Bug #67289
closedProblems with cf_cache_imagesizes in TYPO3 7.2
100%
Description
In TYPO3 7.2 the old cache_imagesizes
has been replaced by a caching framework implementation (#28484).
I see issues with this both in the behaviour as well as by examining TYPO3\CMS\Core\Imaging\GraphicalFunctions
Previously, the key to identify cache_imagesizes
entries was "md5filename
", built as the name says from md5 of filename.
(see the WHERE part in INSERT and DELETE queries of old implementation)
Now the cache identifier is $statusHash
which is sha1() of file mtime and filesize.
(see method generateCacheKeyForImageFile
)
Observations:
1. Problem with cache invalidation.
If the file is overwritten, the $statushash
almost always changes.
Invalid Cache entry can't be identified anymore and removal is not possible.
In the previous implementation, file overwrite would lead to successful removal of old entries, because the cache key was not based on parameters that change when a file is overwritten but on filename hash. It would only fail on collisions of hash algorithm.
2. Ambiguous identifier
If two files have the same mtime and filesize, but different dimensions, their cache identifiers collide.
This is unlikely but can absolutely happen (I could demonstrate with test files)
This did not happen with the previous implementation (though md5 collisions were possible).
Since filemtime is a very easily changed parameter (might be the same for 1000s of files after an automatic import, or many files are touch()'ed for some reason) and filesize is not required to be unique inside a directory tree, the cache identifier is ambiguous.
3. Management hard to handle for third-party code
In the previous implementation cache_imagesizes
was easily manageable:
If a script changed an image file, it could clear out any existing cached dimensions for the file by deleting the entries for that filename. In fact the file name was even kept in cleartext in the rows.
In the current implementation there is no direct access, one can try to access by capturing the previous filesize and mtime before changing the file but it is uncertain whether these haven't been changed already...
Also, $cache->get()
and $cache->set()
use $statusHash
as identifier but $cache->remove
tries to use $filePath
Shouldn't these operations use a consistent identifier for the cache?
It looks so when examining the methods in \TYPO3\CMS\Core\Cache\Backend\Typo3DatabaseBackend
I would suggest the following changes
- separate
$identifier
and$statusHash
- continue to use
$statusHash
for comparison against "hash
" in stored data (change detection) $identifier
can be just sha1() of $filePath- use same
$identifier
for get, set, and remove, when we have found that statusHash is changed for the file
Then we should have neither the md5 collisions of old implementation, nor the filesize collisions of the new.
Also invalidation of outdated entries should work again.
Files
Updated by Markus Klein over 9 years ago
- Status changed from New to Accepted
- Priority changed from Should have to Must have
- Target version changed from next-patchlevel to 7.3 (Packages)
- Is Regression changed from No to Yes
- Sprint Focus set to Stabilization Sprint
Sounds very reasonable!
Thanks for this extensive report. Can you provide a patch?
Updated by Christian Reiter over 9 years ago
- Get two images that have the same file size but different width + height (examples attached)
- give them the same mtime by "touch()" (you may have to "clearstatcache()")
- the example images have 13877 bytes but one is 390x390 and the other is 370x331
- Create a page that displays these two images with a normal content element
- Give it a Typoscript PAGE object that just renders only tt_content so you don't output any other images (logos etc)
- Make sure you don't have a setup that is so responsive that it doesn't use imagesizes anywhere... then you won't get any cahced imagedimensions ;)
- Flush page cache, truncate cf_cache_imagesizes and sys_file_processedfile
- View the page
- Check the cf_cached_imagesizes
Result with current implementation:
Files:
13877 Jun 05 2015 13:11:12 1.gif 13877 Jun 05 2015 13:11:12 2.gif
cf_cache_imagesizes
+----+------------------------------------------+------------+--------------------------------------------------------------------------------------------------------------------+ | id | identifier | expires | content | +----+------------------------------------------+------------+--------------------------------------------------------------------------------------------------------------------+ | 1 | 3b8f080adc6199b4f655fc1d9825d9a7255a36fa | 2145909600 | a:3:{s:4:"hash";s:40:"3b8f080adc6199b4f655fc1d9825d9a7255a36fa";s:10:"imagewidth";i:390;s:11:"imageheight";i:390;} | +----+------------------------------------------+------------+--------------------------------------------------------------------------------------------------------------------+
So you get only one entry with 390x390 for both images.
This is the collision due to ambiguous identifier.
If you clear everyting again, resort the images, and view the page again, then there will be only the entry for 370x331 - whatever gets rendered first enters the cache.
Now if we flush pagecache again, truncate sys_file_processedfile again, (also clear browser cache),
then change the mtimes of both files to be something different for each,
then view the page again,
Files:
13877 Jun 05 2015 13:24:51 1.gif 13877 Jun 05 2015 13:25:42 2.gif
cf_cache_imagesizes
+----+------------------------------------------+------------+--------------------------------------------------------------------------------------------------------------------+ | id | identifier | expires | content | +----+------------------------------------------+------------+--------------------------------------------------------------------------------------------------------------------+ | 1 | 3b8f080adc6199b4f655fc1d9825d9a7255a36fa | 2145909600 | a:3:{s:4:"hash";s:40:"3b8f080adc6199b4f655fc1d9825d9a7255a36fa";s:10:"imagewidth";i:390;s:11:"imageheight";i:390;} | | 2 | 8aae8a74252883528986dfa42f6caf1ba95db0c4 | 2145909600 | a:3:{s:4:"hash";s:40:"8aae8a74252883528986dfa42f6caf1ba95db0c4";s:10:"imagewidth";i:390;s:11:"imageheight";i:390;} | | 3 | c2a278a388e742664fa3186bd69ff643e1413f12 | 2145909600 | a:3:{s:4:"hash";s:40:"c2a278a388e742664fa3186bd69ff643e1413f12";s:10:"imagewidth";i:370;s:11:"imageheight";i:331;} | +----+------------------------------------------+------------+--------------------------------------------------------------------------------------------------------------------+
Now since they have different mtimes, we get two new entries for the correct imagedimensions.
However the single old entry can't be deleted because the cacheing implementation couldn't realize that it corresponded to the same files - since the cache identifier is based on mtime.
This is the failure to invalidate cache.
A patch should pass these tests with different results.
- Test 1:
- Before patch - 1 entry
- After patch - should be 2 entries
- Test 2:
- Before patch - 3 entries
- After patch - should be 2 NEW entries, OLD entries should be gone, but identifier hash should be identical with those generated in Test 1
Updated by Gerrit Code Review over 9 years ago
- Status changed from Accepted to Under Review
Patch set 1 for branch master of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at http://review.typo3.org/39979
Updated by Christian Reiter over 9 years ago
Repeating the tests described above with the patch
Test 1
Caching dimensions for files with identical size and mtime.
Files:
13877 Jun 05 2015 13:33:59 1.gif 13877 Jun 05 2015 13:33:59 2.gif
cf_cache_imagesizes:
+----+------------------------------------------+------------+--------------------------------------------------------------------------------------------------------------------+ | id | identifier | expires | content | +----+------------------------------------------+------------+--------------------------------------------------------------------------------------------------------------------+ | 1 | 547f8f1673ce12fafe77fc012a9910f9f2912d59 | 2145909600 | a:3:{s:4:"hash";s:40:"bedd7bd17f3856b97516ee9c6ce06cb14731d961";s:10:"imagewidth";i:390;s:11:"imageheight";i:390;} | | 2 | 2316b5bf0756526bad6a06011c23faa20d0397b9 | 2145909600 | a:3:{s:4:"hash";s:40:"bedd7bd17f3856b97516ee9c6ce06cb14731d961";s:10:"imagewidth";i:370;s:11:"imageheight";i:331;} | +----+------------------------------------------+------------+--------------------------------------------------------------------------------------------------------------------+
Result: Cache collision and ambiguity of identifier appears resolved.
As expected, we get 2 different entries because the filemitme and filesize are really irrelevant.
Test 2
Change the files, old cache should be invalidated snce it must be recalculated. We can't know beforehand whether the result is going to be the same or different - since the whole point of this cache is to minimize direct calls to getimagesize, imagemagick identify etc.
Files:
13877 Jun 05 2015 13:45:06 1.gif 13877 Jun 05 2015 13:45:33 2.gif
cf_cache_imagesizes:
+----+------------------------------------------+------------+--------------------------------------------------------------------------------------------------------------------+ | id | identifier | expires | content | +----+------------------------------------------+------------+--------------------------------------------------------------------------------------------------------------------+ | 3 | 547f8f1673ce12fafe77fc012a9910f9f2912d59 | 2145909600 | a:3:{s:4:"hash";s:40:"055dd4e3a69c5d4818d81950fb547e8adfcc65ba";s:10:"imagewidth";i:390;s:11:"imageheight";i:390;} | | 4 | 2316b5bf0756526bad6a06011c23faa20d0397b9 | 2145909600 | a:3:{s:4:"hash";s:40:"920d7e7e71bfde2663a53b7124fd7f7668c12954";s:10:"imagewidth";i:370;s:11:"imageheight";i:331;} | +----+------------------------------------------+------------+--------------------------------------------------------------------------------------------------------------------+
Result: Failure of invalidation appears resolved
The old cache IDs 1,2 are gone, replaced by 2 new ones with 3,4.
The identifier remains constant, the status hash changes. Previously they would both change (as they were the same thing)
Updated by Gerrit Code Review over 9 years ago
Patch set 2 for branch master of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at http://review.typo3.org/39979
Updated by Gerrit Code Review over 9 years ago
Patch set 3 for branch master of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at http://review.typo3.org/39979
Updated by Gerrit Code Review over 9 years ago
Patch set 4 for branch master of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at http://review.typo3.org/39979
Updated by Christian Reiter over 9 years ago
- Status changed from Under Review to Resolved
- % Done changed from 0 to 100
Applied in changeset 5834ed21c8d5df6747581c9b7d259a710fa7bdf5.
Updated by Riccardo De Contardi about 7 years ago
- Status changed from Resolved to Closed