Project

General

Profile

Actions

Bug #91768

open

Race condition while caching data using SimpleFileBackend

Added by Michael Stucki almost 4 years ago. Updated 3 months ago.

Status:
New
Priority:
Should have
Assignee:
-
Category:
Caching
Target version:
-
Start date:
2020-07-08
Due date:
% Done:

0%

Estimated time:
TYPO3 Version:
10
PHP Version:
7.2
Tags:
Complexity:
Is Regression:
Sprint Focus:

Description

When two requests run at the same time:
- request A clears a cache (e.g. cache_core)
- request B tries to write into the same cache

In this situation, request B may fail because the parent folder is gone:

[06-Jul-2020 17:54:02] WARNING: [pool www] child 4978 said into stderr: "NOTICE: PHP message: https://example.host/ - core: Core: Error handler (FE): PHP Warning: file_put_contents(/var/www/html/html/typo3temp/var/Cache/Data/l10n/5f03491aac1d8038441429.temp): failed to open stream: No such file or directory in /var/www/html/vendor/typo3/cms/typo3/sysext/core/Classes/Cache/Backend/SimpleFileBackend.php line 236" 
[06-Jul-2020 17:54:02] WARNING: [pool www] child 4978 said into stderr: "NOTICE: PHP message: https://example.host/ - Core: Exception handler (WEB): Uncaught TYPO3 Exception: #1334756737: The temporary cache file "/var/www/html/html/typo3temp/var/Cache/Data/l10n/5f03491aac1d8038441429.temp" could not be written. | TYPO3\CMS\Core\Cache\Exception thrown in file /var/www/html/vendor/typo3/cms/typo3/sysext/core/Classes/Cache/Backend/SimpleFileBackend.php in line 239. Requested URL: https://example.host/home/" 

This seems to happen more often on non-local filesystems because they are slower. However, it could also happen when using a local temp folder.


Files


Related issues 4 (2 open2 closed)

Related to TYPO3 Core - Bug #87174: .... typo3temp/var/cache/code/cache_core/site-configuration.php): Access is deniedNew2018-12-16

Actions
Related to TYPO3 Core - Task #88927: The temporary cache file ... could not be writtenClosed2019-08-06

Actions
Related to TYPO3 Core - Bug #99821: cache:warmup always recreates core code cache files even if they exist and the content will not changeRejected2023-02-04

Actions
Related to TYPO3 Core - Bug #100123: Regular exceptions due to dependency injection (di) cacheNew2023-03-09

Actions
Actions #1

Updated by Michael Stucki almost 4 years ago

I spent a lot of time analyzing this problem, and my conclusion is that the error should be ignored by TYPO3:

If it happens that one request clears the cache while another request tries to write to it, just ignore if this fails. This means that the result is not cached, but the page can still be generated. The cache will be filled with one of the next requests as soon as the temp folder exists again...

This is my current proposal to solve / ignore this error:

diff --git a/typo3/sysext/core/Classes/Cache/Backend/SimpleFileBackend.php b/typo3/sysext/core/Classes/Cache/Backend/SimpleFileBackend.php
index d2dbb371fb..09e5f15f25 100644
--- a/typo3/sysext/core/Classes/Cache/Backend/SimpleFileBackend.php
+++ b/typo3/sysext/core/Classes/Cache/Backend/SimpleFileBackend.php
@@ -226,13 +226,19 @@ class SimpleFileBackend extends AbstractBackend implements PhpCapableBackendInte
             throw new \InvalidArgumentException('The specified entry identifier must not be empty.', 1334756736);
         }
         $temporaryCacheEntryPathAndFilename = $this->cacheDirectory . StringUtility::getUniqueId() . '.temp';
-        $result = file_put_contents($temporaryCacheEntryPathAndFilename, $data);
+        $result = @file_put_contents($temporaryCacheEntryPathAndFilename, $data);
         GeneralUtility::fixPermissions($temporaryCacheEntryPathAndFilename);
         if ($result === false) {
-            throw new Exception('The temporary cache file "' . $temporaryCacheEntryPathAndFilename . '" could not be written.', 1334756737);
+            // This operation may fail when another request is clearing the cache (by removing and re-creating $this->cacheDirectory) in the same moment.
+            // Ignore this error and return without storing the result. A future request will come back here and try again...
+            return;
         }
         $cacheEntryPathAndFilename = $this->cacheDirectory . $entryIdentifier . $this->cacheEntryFileExtension;
-        rename($temporaryCacheEntryPathAndFilename, $cacheEntryPathAndFilename);
+        $result = @rename($temporaryCacheEntryPathAndFilename, $cacheEntryPathAndFilename);
+        if ($result === false) {
+            // This may fail for the same reason as above.
+            return;
+        }
         if ($this->cacheEntryFileExtension === '.php') {
             GeneralUtility::makeInstance(OpcodeCacheService::class)->clearAllActive($cacheEntryPathAndFilename);
         }

I'm not 100% happy with this approach, but what are the alternatives?

  • Wait some milliseconds and try again?
  • Use locking to gain exclusive access to the SimpleFileBackend (keep in mind that this should work over multiple hosts)
  • Stop clearing the cache by removing the whole folder
  • ...

Let me know what you think!

Actions #2

Updated by Michael Stucki almost 4 years ago

  • Related to Bug #87174: .... typo3temp/var/cache/code/cache_core/site-configuration.php): Access is denied added
Actions #3

Updated by Mathias Brodala over 3 years ago

  • Related to Task #88927: The temporary cache file ... could not be written added
Actions #4

Updated by Mathias Brodala over 3 years ago

As mentioned in Slack the change suggested by Michael here was the only thing which allowed me to complete my deployment (switch from TYPO3v8 to TYPO3v9). Everything else I tried before (manually creating the cache directory, creating the cache directory in the SimpleFileBackend before writing the file, switching to FileBackend) didn't help.

Actions #5

Updated by ondro no-lastname-given over 3 years ago

We struggled with the same troubles on typo3 hosted on Kubernetes cluster (with multiple pods) or hosted on multiple servers (active/active with LB) which shares/store cache files via persistence volume (in case of k8s) or NFS/Ceph file systems. No matter of typo3 version 8/9/10

Partially we solved it by moving caches to Redis but 'core' cache is not possible to configure to use redis ... :(

Have you found a solution for that?
thx

Actions #6

Updated by Michael Stucki over 3 years ago

Did you try my patch from above? Thanks to this my websites run fine in Kubernetes with multiple pods. Feel free to ping me on Slack if you need more infos.

Actions #7

Updated by ondro no-lastname-given over 3 years ago

Michael Stucki wrote in #note-6:

Did you try my patch from above? Thanks to this my websites run fine in Kubernetes with multiple pods. Feel free to ping me on Slack if you need more infos.

Hi Michael we will try your patch for sure although it's quite dirty workaround and there should be a nicer way for that ...

Actions #8

Updated by Michael Stucki over 3 years ago

Feel free to add suggestions on how this could be improved.

Actions #9

Updated by ondro no-lastname-given over 3 years ago

Michael Stucki wrote in #note-6:

Did you try my patch from above? Thanks to this my websites run fine in Kubernetes with multiple pods. Feel free to ping me on Slack if you need more infos.

I've tried your patch but unfortunately it doesn't help :(
Also I've tried other workarounds (to ignore errors etc.) but nothing helps here ...

Actions #10

Updated by Mathias Brodala almost 2 years ago

Here's another patch variant which we use in production. It tones down the exception to a warning and logs another warning if the rename fails.

Actions #11

Updated by Sybille Peters about 1 year ago

In this situation, request B may fail because the parent folder is gone:
Stop clearing the cache by removing the whole folder

Is it really necessary to remove all directories when clearing the cache? (Actually, in SimpleFileBackend::flush(), the directory is first renamed and then removed, stating that "This way directories can be flushed faster to prevent race conditions on concurrent processes accessing the same directory." Yes, that does make sense, if all files must be flushed and recreated at once. A mv is usually faster then a rm -rf.

So here, not removing the directory, and removing the files and then creating them would most likely make the problem worse.

SimpleFileBackend::flush

if (rename($directory, $temporaryDirectory)) {
    GeneralUtility::mkdir($directory);
    clearstatcache();
    GeneralUtility::rmdir($temporaryDirectory, true);
}

What might be a different approach? Not removing all files, but only writing files when necessary (e.g. when the content changes).

This would however extend the process of recreating cache files. How do we know we should check and recreate? Can we do a warmup of all files at once using this approach?

Such as the files in:

var/cache/code/
var/cache/code/fluid_template
var/cache/code/di
var/cache/code/core
var/cache/code/news
var/cache/code/static_info_tables

Do a "refresh", not flush which is a flush+warmup on a file-by-file basis:
1. save timestamp at the beginning (timestamp_start)
2. create the necessary files. If an identical file already exists (with same content), "touch" it only (update file modification time)
3. if a file does not exist or content has changed, write it (remove and create)
4. Afterwards, remove all files which were not rewritten (where timestamp is before timestamp_start)

(I am not sure, if this will improve the process or introduce new problems, but it might be something to look into. I use a similar approach in other areas, such as brofix. In linkvalidator, when checking for links via the scheduler - which might run for hours - all broken link records are removed and then everything is created again from scratch. In brofix, I do not truncate entire database at the beginning, but update existing records with the timestamp and at the end of the process remove all records with timestamp before beginning of process. For long running tasks, the flush all and then create approach is not feasible. With concurrent processes it also has some drawbacks.).

Actions #12

Updated by Sybille Peters about 1 year ago

  • Related to Bug #99821: cache:warmup always recreates core code cache files even if they exist and the content will not change added
Actions #13

Updated by Benjamin Franzke about 1 year ago

Two suggestions:

a) Do not share code-caches between workers/nodes (or whatever you wanna call it). The code-cache is here to make the system fast, if you share it, you can probably rather disable the caches at all and be faster ;)
…and you do not need/want to be able to flush system caches from the backend on HA systems, so there is really no need/desire for shared code caches.
Really, NFS at all is a bad idea for sharing code, you want to share DATA, not CODE…

b) Consider not flushing (all) caches after/during deployments at all
I implemented cache:warmup with the intention to be used in line with grouped flushes.
See https://docs.typo3.org/c/typo3/cms-core/main/en-us/Changelog/11.4/Feature-90197-IntroduceCacheFlushConsoleCommand.html#impact
In short:
1) cache:warmup
2) symlink switch
3) cache:flush -g pages

With that you never have to flush but simply warmup. (oh and BTW I wouldn't even share code-caches between releases – don't share var/, share only shared things like var/session var/log var/lock (locally! not via nfs), which makes this approach even more straight-forward)

Actions #14

Updated by Stephan Großberndt 3 months ago

  • Related to Bug #100123: Regular exceptions due to dependency injection (di) cache added
Actions #15

Updated by Stephan Großberndt 3 months ago

I have the same issue, often a combination of "The temporary cache file /var/cache/code/di/xxx could not be written" followed by "Class 'DependencyInjectionContainer_yyy' not found".

Wed, 17 Jan 2024 06:39:29 +0100 [WARNING] request="af0b4c82ead3b" component="TYPO3.CMS.Core.Error.ErrorHandler": 
Core: Error handler (BE): PHP Warning: file_put_contents(/var/www/domain/releases/185/var/cache/code/di/65a7681100488496629287.temp): 
failed to open stream: No such file or directory in /var/www/domain/releases/185/private/typo3/sysext/core/Classes/Cache/Backend/SimpleFileBackend.php line 229

Wed, 17 Jan 2024 06:39:29 +0100 [CRITICAL] request="af0b4c82ead3b" component="TYPO3.CMS.Core.Error.ProductionExceptionHandler":
Core: Exception handler (WEB: BE): TYPO3\CMS\Core\Cache\Exception, code #1334756737, file /var/www/domain/releases/185/private/typo3/sysext/core/Classes/Cache/Backend/SimpleFileBackend.php, line 232: The temporary cache file "/var/www/domain/releases/185/var/cache/code/di/65a7681100488496629287.temp" could not be written. - {"mode":"WEB","application_mode":"BE","exception_class":"TYPO3\\CMS\\Core\\Cache\\Exception","exception_code":1334756737,"file":"/var/www/domain/releases/185/private/typo3/sysext/core/Classes/Cache/Backend/SimpleFileBackend.php","line":232,"message":"The temporary cache file \"/var/www/domain/releases/185/var/cache/code/di/65a7681100488496629287.temp\" could not be written.","request_url":"https://xxx","exception":null}

Wed, 17 Jan 2024 06:49:40 +0100 [WARNING] request="1a56033cd0ac5" component="TYPO3.CMS.Core.Error.ErrorHandler": Core: Error handler (BE): PHP Warning: rename(/var/www/domain/releases/185/var/cache/code/di/65a76a7496f6d604833762.temp,/var/www/domain/releases/185/var/cache/code/di/DependencyInjectionContainer_f256ee52bc9d8ee1296f6d6a4fd948bd1efbc7c3.php): No such file or directory in /var/www/domain/releases/185/private/typo3/sysext/core/Classes/Cache/Backend/SimpleFileBackend.php line 235

Wed, 17 Jan 2024 06:49:40 +0100 [CRITICAL] request="1a56033cd0ac5" component="TYPO3.CMS.Core.Error.ProductionExceptionHandler": Core: Exception handler (WEB: BE): Error, code #0, file /var/www/domain/releases/185/private/typo3/sysext/core/Classes/DependencyInjection/ContainerBuilder.php, line 98: Class 'DependencyInjectionContainer_f256ee52bc9d8ee1296f6d6a4fd948bd1efbc7c3' not found - {"mode":"WEB","application_mode":"BE","exception_class":"Error","exception_code":0,"file":"/var/www/domain/releases/185/private/typo3/sysext/core/Classes/DependencyInjection/ContainerBuilder.php","line":98,"message":"Class 'DependencyInjectionContainer_f256ee52bc9d8ee1296f6d6a4fd948bd1efbc7c3' not found","request_url":"https://xxx","exception":null}

This also happens regularly without new deployments or manual clearing of caches on systems which are not sharing any caches, but have large cache directories and apparently in comparison slow disk access:

user@host:/var/www/domain/releases/185/var/cache/code/di$ dd if=/dev/zero of=/tmp/test.img bs=512 count=1000 oflag=dsync
1000+0 records in
1000+0 records out
512000 bytes (512 kB, 500 KiB) copied, 1.31762 s, 389 kB/s
Actions

Also available in: Atom PDF