Bug #106568
openDuplicate sys_file entries in database due to interfering file operations
0%
Description
There are a few other issues that might be related but the error scenario is usually different, hence this new issue.
While this ticket defines TYPO3 v11 as it's the version we discovered this issue, it happens in v13, too and it's safe to assume 12 to be affected as well as many if not all previous versions of TYPO3.
Scenario¶
A client of ours wants to re-organize their fileadmin content that has been grown very big over the years in a TYPO3 v11 system. This client has multiple editors who move single files or entire folders from one location to another within the TYPO3 Backend module filelist. Everything is purely done inside the backend. In this client's website there is a TYPO3\CMS\Scheduler\Task\FileStorageIndexingTask
defined and runs every so often.
After starting the re-organization our client reported issues with references of files that have been moved being broken in the website (think PDF-Downloads, or news images). These files are displayed in the filelist as having 0 references as well. Not all of the moved files but many.
Observed problem¶
The problem we observed is that the file storage indexing task is running while editors are actively moving files around inside the filelist module. This leads to the indexing task to create new sys_file record entries in the database for the files that are being moved around but haven't been added/updated yet by the move process.
These duplicated entries usually get new metadata records as well instead of keeping previous ones as a side effect. Neither sys_files are set as missing in this context but if they were we'd lose references and metadata in the progress/long run.
Attempted solution¶
First thing is to check the sys_refindex as it's usually the first recommendation on dealing with sys_file issues but updating it changes nothing. The issue persists.
The other obvious solution would be to disable the file storage indexing task while editors are moving files around or let it run at times the editors are not working. We attempted to do this and the issue decreased in frequency but didn't stop.
Further investigation¶
We dug through the TYPO3 core code to understand what's happening and found out that the file storage indexing task is not aware of any running filelist module operations. That's why it creates sys_file database records with new sys_file_metadata records. Similarly the move operations aren't aware of any other filelist operations like listing folder contents. That's why the issue persisted as multiple editors click around in the same folder(s) at the same time while one or more of them are moving files around.
Test system¶
To be sure that this issue has nothing to do with the project itself we created a fresh TYPO3 v13 installation and tried to reproduce this issue.
For simplicity we didn't use the storage indexing task as race conditions that way are hard to provoke. But manually moving files around in the filelist module while listing folder contents in another tab will lead to duplicated sys_file records.
This is the test scenario that we could reproduce issues with in a WSL2+Docker+ddev environment:
1. Open two tabs in the backend file list module
2. In one tab move a folder to another location (must be sufficiently big to take a few seconds for you to be able to click around)
3. In the other tab list the contents of either while the move is in progress:
3a) the source folder you're moving files from (ie. you move locationA/testfolder
-> locationB/testfolder
, you open locationA/testfolder
) or
3b) the target folder you're moving files into (you open locationB/testfolder
)
4. Wait for everything to be loaded/moved
5. Check sys_file records in the database for duplicates
Depending on whether you look into the source or target folder the result is different:
Source : You end up with sys_file duplicates, that share sha1, creation date, folder_hash but have different identifiers (one with locationA/testfolder/...
and one with locationB/testfolder/...
).
Target : Duplicates are the exact same, including the identifier.
This leads to broken references despite a seemingly up-to-date sys_refindex table and incorrect display of references in the filelist module (- isntead of Number X).
We tried this setup in a MacOS environment as well, but those move files instantly around. So we weren't able to reproduce the issue there.
Conclusion¶
It seems that TYPO3 needs some kind of locking mechanism to prevent interference for moving/copying filelist operations from running at the same time.
We were able to find similar issues with other clients as well but to a lot lesser extent. The issue does not lead to exceptions and is therefore not an obvious issue but it seems to be a common problem within TYPO3 and we are at a loss on how to best solve or prevent this ourselves.
Updated by network.publishing GmbH 14 days ago
- PHP Version changed from 8.3 to 8.0
- Update PHP version used with v11
Updated by Garvin Hicking 14 days ago
· Edited
- Status changed from New to Accepted
- Target version set to Candidate for Major Version
- TYPO3 Version changed from 11 to 13
- Tags set to Ux-decision
- Complexity changed from hard to nightmare
Thanks for this well-crafted issue report. Much appreciated.
I agree that some kind of locking needs to be set up. The problem however is, that a running task may take a long time, and another process should not "pile up" and leave the connection open, because that might lead to a spiral of death.
If we make the first operation blocking, all subsequent calls would need some kind of UI feedback like "this process is currently unavailavle, wait for completion".
The block would need to have a timeout because if anything fails, it should not forever remain locked. But what is a good timeout setting? For some it's 2 minutes, but maybe moving huge chunks could take 15 minutes or more. In case of failures, should actions be really blocked that long? Would it need to be configurable? Would it need a backend like scheduler to see "running" tasks?
That would mean another write operation after every single moved file to indicate progress, which will slow it down even more.
I guess this all leads to the conclusion that locking remains unstable. Instead we would probably need a queueing system, that offloads the queing action from processing. But if multiple queues are stacked and operate on files that have since been moved, what happens then? The operation would need to fail,but now to notify editors of that failure in a decoupled process? Via mail? Via workers? What if the editor logged out meanwhile?
Sadly I can only provide questions and not solutions here and I think this needs a UX/UI decision.
(I'm raising the TYPO3 version to v13 here, a fix for v12 is unlikely, and v11 updates not covered by the issue tracker here)
Updated by network.publishing GmbH 6 days ago
· Edited
Thanks, Garvin, for the reply and for taking the time to look into the problem!
Not wanting to over-complicate things, but since complexity is very high, anyway...we have identified another manifestation of the problem: The race condition/duplication of sys_file records also occurs when files are being moved outside of TYPO3 (e.g. on NFS/SMB shares mounted to a TYPO3 storage location), while the storage is also being processed by the Storage Index Update task at the same time.
That scenario is even harder to deal with, since TYPO3 or the web server do not know about underlying file system operations, and I can't think of a way to implement queuing/locking in a safe and platform-agnostic way. The only solution would be to remember to turn off the Storage Index Update task in the scheduler (and of course to tell BE users not to click around in the file tree) while moving data on externally mounted file server shares. In practice, this has - in our client's scenario - proven not to be doable.
Maybe TYPO3 could somehow also detect if SMB/NFS/... files are being used and consider that in whatever decisions that need to be taken, then?
Updated by Garvin Hicking 5 days ago
A general solution for this might not happen quickly for this.
On your server, are you able to run a Symfony Messenger bus daemon (a bit like cqrs) for example?
You could then create a custom task plus hook into the move process. An idea could be:
- When moving, do nothing immediate (prevent actual move) - should hopefully be possible with book/event?
- instead: create a queue of files to be moved, for example with a new sys_file.targetDir column. Update all records to set those to the target directory.
- if a record already has this flag, send a mail/notification/ui feedback (the latter is harder to patch into the core on your own)
- second component: messenger querying these records. Iterate through each one (maybe at a batch of 10/50/100/....) and individually move each file and clear the targetDir column individually.
- First the DB would be updated so a new file never shows up to the storage index task without an accompanying sys_file entry.
- Failure to move a file would need some kind of reporting and resetting
- this is performed every few seconds by the messenger, so a queue should be operated quickly. It might need its own lock-like system to somehow tell the daemon it only needs to check the database after a move has been instantiated, and that lock is removed once no more records were found.
Updated by Garvin Hicking 5 days ago
(Detecting nfs/smbfs/.... would need either manual attribution or a very low-level system access that PHO does not neccessarily have, to query mounts and so on. A symlink can be to the same but also different filesystems; mounts can use wrappers and meta file storages, I think this cannot be reliably auto-detected. So you would need to flag such storages like with the "remote" attribution, like a "slowStorage" boolean flag or so. But that would not solve the underlying problem for such storages....)