Bug #103707
openDuplicate entries in sys_file table
0%
Description
We have encountered an issue where there are duplicate entries in the sys_file
table pointing to the same physical file. This results in the Filelist module not displaying references to those files correctly. Consequently, these files can be deleted even though there are references pointing to another sys_file
entry that shares the same file in the filesystem.
To check for duplicates, you can run:
SELECT COUNT(*), `identifier` FROM `sys_file` GROUP BY `identifier` HAVING COUNT(*) > 1;
Here is an example of a duplicate entry:
SELECT * FROM `sys_file` where identifier like "/user_upload/Bilder/Wirtschaftspruefer/Veranstaltung_2024/mbs_04.jpg"; +------+-----+------------+--------------+---------+---------+------+----------+----------------------------------------------------------------------+------------------------------------------+------------------------------------------+-----------+------------+------------+------------------------------------------+--------+---------------+-------------------+ | uid | pid | tstamp | last_indexed | missing | storage | type | metadata | identifier | identifier_hash | folder_hash | extension | mime_type | name | sha1 | size | creation_date | modification_date | +------+-----+------------+--------------+---------+---------+------+----------+----------------------------------------------------------------------+------------------------------------------+------------------------------------------+-----------+------------+------------+------------------------------------------+--------+---------------+-------------------+ | 2792 | 0 | 1711288107 | 1711288107 | 0 | 1 | 2 | 0 | /user_upload/Bilder/Wirtschaftspruefer/Veranstaltung_2024/mbs_04.jpg | ed3516d8ff67f42ae9a0b4ca9443989421b21324 | 928bfe2c54bf0575ae5ad65b55cb3deee6aaa658 | jpg | image/jpeg | mbs_04.jpg | 61782595dddae96224e259cf7170dd646fb4e3d1 | 562574 | 1711288107 | 1711288107 | | 2794 | 0 | 1712041049 | 1712041049 | 0 | 1 | 2 | 0 | /user_upload/Bilder/Wirtschaftspruefer/Veranstaltung_2024/mbs_04.jpg | ed3516d8ff67f42ae9a0b4ca9443989421b21324 | 928bfe2c54bf0575ae5ad65b55cb3deee6aaa658 | jpg | image/jpeg | mbs_04.jpg | 61782595dddae96224e259cf7170dd646fb4e3d1 | 562574 | 1712041049 | 1711288107 |
In another instance, there are multiple sys_file
records pointing to files outside the fileadmin
directory. This might indicate that the problem is not related to the Filelist module and user uploads, but could also be rooted in page rendering.
| uid | pid | tstamp | last_indexed | missing | storage | type | metadata | identifier | identifier_hash | folder_hash | extension | mime_type | name | sha1 | size | creation_date | modification_date | +------+-----+------------+--------------+---------+---------+------+----------+---------------------------------------------------------------------+------------------------------------------+------------------------------------------+-----------+---------------+-------------+------------------------------------------+------+---------------+-------------------+ | 2754 | 0 | 1680786128 | 0 | 0 | 0 | 2 | 0 | /typo3conf/ext/sitepackage/Resources/Public/Icons/Flags/flag_at.svg | aa8d82349adeaa66fa834c0bab77a507d74bbd3a | 15b08915cf0ec6b4ffef65022ba6219d4111ee18 | svg | image/svg+xml | flag_at.svg | 87fcd7ee2c2fe0e53935bd952efdb97dde1b3c66 | 226 | 1680786117 | 1680786007 | | 2755 | 0 | 1680786128 | 0 | 0 | 0 | 2 | 0 | /typo3conf/ext/sitepackage/Resources/Public/Icons/Flags/flag_at.svg | aa8d82349adeaa66fa834c0bab77a507d74bbd3a | 15b08915cf0ec6b4ffef65022ba6219d4111ee18 | svg | image/svg+xml | flag_at.svg | 87fcd7ee2c2fe0e53935bd952efdb97dde1b3c66 | 226 | 1680786117 | 1680786007 |
This issue appears to be at least four years old, as i found the extension https://extensions.typo3.org/package/elementareteilchen/unduplicator, which was created to find and fix such duplicates. The extension dates back to 2020.
We found these problems in many other projects. However, we do not have a TYPO3 v12 or v13 instance that is not an upgrade from TYPO3 v11, so we are currently unsure if this issue affects TYPO3 v12 or v13 as well.
Updated by Christian Kuhn 9 months ago · Edited
Thanks Ulrich :)
Great report!
I'll try to fire that query onto the one or the other b13 project as well, to see if we experience similar issues.
Some things come to my mind:- Fixing currently broken DB status could be a job for dbdoctor, I'm not sure about details at the moment, though.
- If the issue persists, we need to get some clue which action in the core creates those dupes.
- We might think about adding a 'unique' key - maybe on the combination of the two hash fields? When done, we should probably see bug reports (with backtraces) when core tries to insert a dupe, which will help us to trace the issue.
- I wonder this hasn't been reported, yet?!
Updated by Mario Lubenka 6 months ago
We have a similar problem in TYPO3 12. But also were not able to pinpoint the issue.
Just noticed that in the database.
Updated by michael kettel 6 months ago · Edited
- TYPO3 Version changed from 11 to 12
- Complexity set to hard
We have the same issue with multiple sys_file entries for the same physical file. We tested this in TYPO3 11 and TYPO3 12 and it is the exact same behaviour. I try to discribe it as good as possible. Hope this is the right place.
We have a TYPO3 setup with multiple File Storages. E.g. fileadmin (ID=1) and storage1 (ID=3) which share almost the same physical folder tree. Some TYPO3 editors have access to all folders, some have only access to a subpart of the folders.
We use the default structure (/fileadmin/user_upload/)
file storage "fileadmin (ID = 1)" - folder1 - folder2 - folder3 - folder4 file storage "storage1 (ID = 3)" (limited access to the filetree for some editors) - folder3 - folder4
Now if an editor uploads a file using the filelist module into folder3 (using fileadmin or storage1), the file gets uploaded normaly and is physically visible in the filesystem on the server.
As soon as any editor opens the same folder in another File Storage, a new sys_file record gets created with the corresponding storage id. We stumbled across this issue because editors where complaining that the Metadata they entered for a file was overitten or did not exist at all if they or someone else opened the same file from another storage. This leads to lots of confusions because they think that they edit the same file (and physically it is the same file because it only exists once on the server), but in reality they edit two (or more) different sys_file records pointing to the same file.
This can be tested like this
1.
Upload a file using one of the File Storages and check the database
SELECT * FROM `sys_file` where identifier like '%mytestfile.png';
This returns one entry with the storage id set to the id of the storage that was beeing used to initially upload the file
2.
Now click the folder the file was uploaded to in the filelist module using another File Storage and run the query again
SELECT * FROM `sys_file` where identifier like '%mytestfile.png';
Now the query returns two entries, one for each storage. The tstamp of the second entry is the exact time the folder was clicked in the filelist module.
So right now we have two sys_file records pointing to the same physical file.
As far as I understand this is a bug. Because the file exists only once, there should be only one sys_file entry for this file and not one sys_file entry for each storage this file was opened in. Especially if multiple people edit the metadata and they all think that they work on the same file but in reality the work on different sys_file entries pointing to the same file.
P.S. (edited)
If we delete the file in one file storage, the physical file on the server gets deleted but only one of the sys_file entries. The other remains in the database, even if the file does not exist anymore.
Updated by Ulrich Mathes 4 months ago
I think #note-4 is a different issue as the duplicates also occurs in instances with only one file storage.
Updated by Ulrich Mathes 2 months ago
It happend again and i can share some information:
- The duplicated
sys_file
records are completely identical exceptuid
- Each of the
sys_file
records do have asys_file_metadata
record- The
sys_file_metadata
record for the lower idsys_file
has acrdate
one second before thecrdate
of the referencedsys_file
- The
- No errors were logged
sys_log
only contains one entry for Uploading file "{identifier}" to "{destination}"- I think the access log shows that the file was uploaded only one time?
Insights (some information is replaced with ###):
sys_file (tstamp equals 17:24:51)
sys_file_metadataUpdated by Ulrich Mathes about 2 months ago
I added a unique key in some of my projects. Till now, no errors appeared.
CREATE TABLE sys_file (
UNIQUE KEY unique_storage_identifier_folder_sha1 (storage, identifier_hash, folder_hash, sha1)
);