Project

General

Profile

Actions

Bug #57134

closed

Duplicate files results in different Metadata

Added by Stefan Froemken about 10 years ago. Updated 6 months ago.

Status:
Closed
Priority:
Should have
Assignee:
-
Category:
File Abstraction Layer (FAL)
Target version:
Start date:
2014-03-21
Due date:
% Done:

0%

Estimated time:
TYPO3 Version:
6.2
PHP Version:
5.4
Tags:
Complexity:
Is Regression:
No
Sprint Focus:

Description

Hello Core-Team,

we have duplicate entries in sys_file. I know, it is hard to remove them, but as long as there are duplicate entries in sys_file we should try to return always the same file.

If you have duplicate entries in sys_file goto filelist into a folder with images. In my example I see 10 images, but there are 20 records in database. As you can see, the duplicate entries were merged. This is because you're using exec_SELECTgetRows with $uidIndexField="identifier". So only the LAST found record was returned!
Please enable "extended view" and click on yellow pen to edit metadata. Add some subheader or what else and save.
Now click on the little icon in front of the image, open clickmenu, and choose edit. Now you have opened another metadata record! But why? This is because you call findOneByStorageUidAndIdentifierHash() instead and this method returns the FIRST found record.

In my opinion BOTH methods should ALWAYS return the SAME record.

Stefan

Actions #1

Updated by Gerrit Code Review about 10 years ago

  • Status changed from New to Under Review

Patch set 1 for branch master of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/28593

Actions #2

Updated by Frans Saris about 10 years ago

  • Status changed from Under Review to Needs Feedback

Hi Stefan,

after cleaning up your index is this still a issue?

gr. Frans

Actions #3

Updated by Stefan Froemken about 10 years ago

Hello Frans,

you can close this ticket now.

For all others who has similar problems I have created a SQL-Query which may help you. Before executing please backup you Database!

Following Query removes all duplicated entries in sys_file. Sure, it is fast (3 seconds for 60.000 entries), but it does NOT check, if one of the duplicate entries has already a relation to sys_file_reference:

DELETE FROM sys_file
USING sys_file, sys_file as tmp_table
WHERE (sys_file.uid > tmp_table.uid)
AND (sys_file.storage = tmp_table.storage AND sys_file.identifier_hash = tmp_table.identifier_hash);

The other Query contains the missing feature from Query above. It removes all duplicated entries from sys_file, as long as the entries does not have a relation to sys_file_reference. Before executing the Query it would be good to create an index on col: sys_file_reference.uid_local so that your Query was executed within 50 minutes (60.000 entries) Else it costs you 6 hours ans more:

DELETE sys_file
FROM sys_file
LEFT JOIN sys_file_reference
ON sys_file.uid = sys_file_reference.uid_local, sys_file as tmp_table
WHERE sys_file.uid > tmp_table.uid
AND sys_file.storage = tmp_table.storage
AND sys_file.identifier_hash = tmp_table.identifier_hash
AND sys_file_reference.uid IS NULL

Stefan

Actions #4

Updated by Nicole Cordes almost 10 years ago

  • Status changed from Needs Feedback to Closed

This one is closed as asked by the issue author.

Actions #5

Updated by Florian Seirer 6 months ago

For anyone coming across this years later like I did: There's a handy script on https://github.com/ElementareTeilchen/unduplicator that fixes this issue.

Actions

Also available in: Atom PDF