Project

General

Profile

Actions

Bug #67289

closed

Problems with cf_cache_imagesizes in TYPO3 7.2

Added by Christian Reiter almost 9 years ago. Updated over 6 years ago.

Status:
Closed
Priority:
Must have
Assignee:
-
Category:
Caching
Target version:
Start date:
2015-06-04
Due date:
% Done:

100%

Estimated time:
TYPO3 Version:
7
PHP Version:
5.5
Tags:
Complexity:
Is Regression:
Yes
Sprint Focus:
Stabilization Sprint

Description

In TYPO3 7.2 the old cache_imagesizes has been replaced by a caching framework implementation (#28484).

I see issues with this both in the behaviour as well as by examining TYPO3\CMS\Core\Imaging\GraphicalFunctions

Previously, the key to identify cache_imagesizes entries was "md5filename", built as the name says from md5 of filename.
(see the WHERE part in INSERT and DELETE queries of old implementation)

Now the cache identifier is $statusHash which is sha1() of file mtime and filesize.
(see method generateCacheKeyForImageFile)

Observations:
1. Problem with cache invalidation.
If the file is overwritten, the $statushash almost always changes.
Invalid Cache entry can't be identified anymore and removal is not possible.
In the previous implementation, file overwrite would lead to successful removal of old entries, because the cache key was not based on parameters that change when a file is overwritten but on filename hash. It would only fail on collisions of hash algorithm.

2. Ambiguous identifier
If two files have the same mtime and filesize, but different dimensions, their cache identifiers collide.
This is unlikely but can absolutely happen (I could demonstrate with test files)
This did not happen with the previous implementation (though md5 collisions were possible).
Since filemtime is a very easily changed parameter (might be the same for 1000s of files after an automatic import, or many files are touch()'ed for some reason) and filesize is not required to be unique inside a directory tree, the cache identifier is ambiguous.

3. Management hard to handle for third-party code
In the previous implementation cache_imagesizes was easily manageable:
If a script changed an image file, it could clear out any existing cached dimensions for the file by deleting the entries for that filename. In fact the file name was even kept in cleartext in the rows.
In the current implementation there is no direct access, one can try to access by capturing the previous filesize and mtime before changing the file but it is uncertain whether these haven't been changed already...

Also,
$cache->get() and $cache->set() use $statusHash as identifier but $cache->remove tries to use $filePath
Shouldn't these operations use a consistent identifier for the cache?
It looks so when examining the methods in \TYPO3\CMS\Core\Cache\Backend\Typo3DatabaseBackend

I would suggest the following changes

  • separate $identifier and $statusHash
  • continue to use $statusHash for comparison against "hash" in stored data (change detection)
  • $identifier can be just sha1() of $filePath
  • use same $identifier for get, set, and remove, when we have found that statusHash is changed for the file

Then we should have neither the md5 collisions of old implementation, nor the filesize collisions of the new.
Also invalidation of outdated entries should work again.


Files

1.gif (13.6 KB) 1.gif Christian Reiter, 2015-06-05 13:50
2.gif (13.6 KB) 2.gif Christian Reiter, 2015-06-05 13:50

Related issues 1 (0 open1 closed)

Related to TYPO3 Core - Feature #28484: t3lib_stdGraphic->getCachedImageDimensions() should leverage caching backendClosed2011-07-26

Actions
Actions #1

Updated by Markus Klein almost 9 years ago

  • Status changed from New to Accepted
  • Priority changed from Should have to Must have
  • Target version changed from next-patchlevel to 7.3 (Packages)
  • Is Regression changed from No to Yes
  • Sprint Focus set to Stabilization Sprint

Sounds very reasonable!

Thanks for this extensive report. Can you provide a patch?

Updated by Christian Reiter almost 9 years ago

To verify a patch here is my test procedure:
  • Get two images that have the same file size but different width + height (examples attached)
    • give them the same mtime by "touch()" (you may have to "clearstatcache()")
    • the example images have 13877 bytes but one is 390x390 and the other is 370x331
  • Create a page that displays these two images with a normal content element
    • Give it a Typoscript PAGE object that just renders only tt_content so you don't output any other images (logos etc)
    • Make sure you don't have a setup that is so responsive that it doesn't use imagesizes anywhere... then you won't get any cahced imagedimensions ;)
  • Flush page cache, truncate cf_cache_imagesizes and sys_file_processedfile
  • View the page
  • Check the cf_cached_imagesizes

Result with current implementation:

Files:

13877 Jun 05 2015 13:11:12 1.gif
13877 Jun 05 2015 13:11:12 2.gif

cf_cache_imagesizes

+----+------------------------------------------+------------+--------------------------------------------------------------------------------------------------------------------+
| id | identifier                               | expires    | content                                                                                                            |
+----+------------------------------------------+------------+--------------------------------------------------------------------------------------------------------------------+
|  1 | 3b8f080adc6199b4f655fc1d9825d9a7255a36fa | 2145909600 | a:3:{s:4:"hash";s:40:"3b8f080adc6199b4f655fc1d9825d9a7255a36fa";s:10:"imagewidth";i:390;s:11:"imageheight";i:390;} |
+----+------------------------------------------+------------+--------------------------------------------------------------------------------------------------------------------+

So you get only one entry with 390x390 for both images.
This is the collision due to ambiguous identifier.

If you clear everyting again, resort the images, and view the page again, then there will be only the entry for 370x331 - whatever gets rendered first enters the cache.

Now if we flush pagecache again, truncate sys_file_processedfile again, (also clear browser cache),
then change the mtimes of both files to be something different for each,
then view the page again,

Files:

13877 Jun 05 2015 13:24:51 1.gif
13877 Jun 05 2015 13:25:42 2.gif

cf_cache_imagesizes

+----+------------------------------------------+------------+--------------------------------------------------------------------------------------------------------------------+
| id | identifier                               | expires    | content                                                                                                            |
+----+------------------------------------------+------------+--------------------------------------------------------------------------------------------------------------------+
|  1 | 3b8f080adc6199b4f655fc1d9825d9a7255a36fa | 2145909600 | a:3:{s:4:"hash";s:40:"3b8f080adc6199b4f655fc1d9825d9a7255a36fa";s:10:"imagewidth";i:390;s:11:"imageheight";i:390;} |
|  2 | 8aae8a74252883528986dfa42f6caf1ba95db0c4 | 2145909600 | a:3:{s:4:"hash";s:40:"8aae8a74252883528986dfa42f6caf1ba95db0c4";s:10:"imagewidth";i:390;s:11:"imageheight";i:390;} |
|  3 | c2a278a388e742664fa3186bd69ff643e1413f12 | 2145909600 | a:3:{s:4:"hash";s:40:"c2a278a388e742664fa3186bd69ff643e1413f12";s:10:"imagewidth";i:370;s:11:"imageheight";i:331;} |
+----+------------------------------------------+------------+--------------------------------------------------------------------------------------------------------------------+

Now since they have different mtimes, we get two new entries for the correct imagedimensions.
However the single old entry can't be deleted because the cacheing implementation couldn't realize that it corresponded to the same files - since the cache identifier is based on mtime.
This is the failure to invalidate cache.

A patch should pass these tests with different results.

  • Test 1:
    • Before patch - 1 entry
    • After patch - should be 2 entries
  • Test 2:
    • Before patch - 3 entries
    • After patch - should be 2 NEW entries, OLD entries should be gone, but identifier hash should be identical with those generated in Test 1
Actions #3

Updated by Gerrit Code Review almost 9 years ago

  • Status changed from Accepted to Under Review

Patch set 1 for branch master of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at http://review.typo3.org/39979

Actions #4

Updated by Christian Reiter almost 9 years ago

Repeating the tests described above with the patch

Test 1
Caching dimensions for files with identical size and mtime.

Files:

13877 Jun 05 2015 13:33:59 1.gif
13877 Jun 05 2015 13:33:59 2.gif

cf_cache_imagesizes:
+----+------------------------------------------+------------+--------------------------------------------------------------------------------------------------------------------+
| id | identifier                               | expires    | content                                                                                                            |
+----+------------------------------------------+------------+--------------------------------------------------------------------------------------------------------------------+
|  1 | 547f8f1673ce12fafe77fc012a9910f9f2912d59 | 2145909600 | a:3:{s:4:"hash";s:40:"bedd7bd17f3856b97516ee9c6ce06cb14731d961";s:10:"imagewidth";i:390;s:11:"imageheight";i:390;} |
|  2 | 2316b5bf0756526bad6a06011c23faa20d0397b9 | 2145909600 | a:3:{s:4:"hash";s:40:"bedd7bd17f3856b97516ee9c6ce06cb14731d961";s:10:"imagewidth";i:370;s:11:"imageheight";i:331;} |
+----+------------------------------------------+------------+--------------------------------------------------------------------------------------------------------------------+

Result: Cache collision and ambiguity of identifier appears resolved.
As expected, we get 2 different entries because the filemitme and filesize are really irrelevant.

Test 2
Change the files, old cache should be invalidated snce it must be recalculated. We can't know beforehand whether the result is going to be the same or different - since the whole point of this cache is to minimize direct calls to getimagesize, imagemagick identify etc.

Files:

13877 Jun 05 2015 13:45:06 1.gif
13877 Jun 05 2015 13:45:33 2.gif

cf_cache_imagesizes:

+----+------------------------------------------+------------+--------------------------------------------------------------------------------------------------------------------+
| id | identifier                               | expires    | content                                                                                                            |
+----+------------------------------------------+------------+--------------------------------------------------------------------------------------------------------------------+
|  3 | 547f8f1673ce12fafe77fc012a9910f9f2912d59 | 2145909600 | a:3:{s:4:"hash";s:40:"055dd4e3a69c5d4818d81950fb547e8adfcc65ba";s:10:"imagewidth";i:390;s:11:"imageheight";i:390;} |
|  4 | 2316b5bf0756526bad6a06011c23faa20d0397b9 | 2145909600 | a:3:{s:4:"hash";s:40:"920d7e7e71bfde2663a53b7124fd7f7668c12954";s:10:"imagewidth";i:370;s:11:"imageheight";i:331;} |
+----+------------------------------------------+------------+--------------------------------------------------------------------------------------------------------------------+

Result: Failure of invalidation appears resolved
The old cache IDs 1,2 are gone, replaced by 2 new ones with 3,4.
The identifier remains constant, the status hash changes. Previously they would both change (as they were the same thing)

Actions #5

Updated by Gerrit Code Review almost 9 years ago

Patch set 2 for branch master of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at http://review.typo3.org/39979

Actions #6

Updated by Gerrit Code Review almost 9 years ago

Patch set 3 for branch master of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at http://review.typo3.org/39979

Actions #7

Updated by Gerrit Code Review almost 9 years ago

Patch set 4 for branch master of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at http://review.typo3.org/39979

Actions #8

Updated by Christian Reiter almost 9 years ago

  • Status changed from Under Review to Resolved
  • % Done changed from 0 to 100
Actions #9

Updated by Riccardo De Contardi over 6 years ago

  • Status changed from Resolved to Closed
Actions

Also available in: Atom PDF