Bug #82929

sys_file_references filled with duplicate/multiple entries

Added by Sigfried Arnold about 2 years ago. Updated 9 months ago.

Status:
Needs Feedback
Priority:
Should have
Assignee:
-
Category:
DataHandler aka TCEmain
Target version:
-
Start date:
2017-11-06
Due date:
% Done:

0%

TYPO3 Version:
8
PHP Version:
7.0
Tags:
Complexity:
Is Regression:
Sprint Focus:

Description

On multiple TYPO3 instances running on TYPO3 8.7.8 (some of them were clean installed with different Versions of the TYPO3 8 branche since the release and now updated to the latest Version) we get multiple entries for the same file.

They are not shown in the backend but are visible in the database and are taken into account while pages are being generated -

SELECT pid, fieldname, COUNT(*) as count FROM sys_file_reference GROUP BY pid,fieldname order by count DESC;

The worst case i found so far looks like this:

50    image    214572
232    assets    352
9    assets    98
9    image    96
103    image    75
138    image    72
138    assets    70
94    assets    42
109    assets    41
235    assets    40

The result is, that the page generation for those pages takes ages (or till max_execution_time is reached in this example, where 200k images are present).

This issue also might be connected to #80804 since the blocked apache processes sometimes do not unlock the lockfiles properly and then apache gets jammed.

Currently i have not figured out yet, what causes the duplication.

Also i found an older TYPO3 7.6 Instance, which does the same.


Related issues

Related to TYPO3 Core - Bug #80804: Apache precesses overload because of typo3temp/locks/flock_ Closed 2017-04-11
Related to TYPO3 Core - Bug #87640: Keep language reference for associated l10n_mode=exclude children Closed 2019-02-04

History

#1 Updated by Frans Saris about 2 years ago

Are you using multiple languages and or workspaces?

#2 Updated by Sigfried Arnold about 2 years ago

On the 7.6 instance we used workspaces and multiple languages, but now workspaces are disabled and only languages are in use

On the 8.7 instances none uses workspacese but all of them use at least one additional language, one uses 9 languages - but there is no corellation between language count and duplicates - the worst case only got 2 languages

#3 Updated by Frans Saris about 2 years ago

Can you show some more column data. What fields are/are not identical

#4 Updated by Sigfried Arnold about 2 years ago

identical fields are the following:

sys_language_uid, l10n_parent, l10n_diffsource, uid_local, uid_foreign, tablenames, fieldname (also title, description, alternative and so on are identical)

here is an example from a web without workspaces/versioning - so i removed all t3ver_* fields (they just contain 0)

This is an example of "bad" case - this is taken from a bigger website with currently 9 languages in total - as you can see those records are inserted the same time - sometimes only a duplicate, somtimes a huge amount of multiple entries - this page only has one image in the media field - i've also provided a grouping by language - as you can see, it inserted 7 copies on a few languages, only L=16 worked properly and L=1 & 2 really went nuts.

COUNT(*) sys_language_uid
1        0
18038    1
18046    2
7        3
7        4
7        5
7        11
1        15
1        16
uid     pid  FROM_UNIXTIME(tstamp)   FROM_UNIXTIME(crdate)  sorting   deleted        l10n_parent         uid_foreign   fieldname
                                                                           hidden           l10n_diffsource  tablenames
                                                                                sys_language_uid   uid_local
212      7    2017-05-11 16:43:40    2016-03-15 09:05:27    1031168   0    0    0    0      ...    18    7    pages    media
39902    7    2017-06-07 17:31:16    2017-06-07 17:31:16    285952    0    0    1    212    ...    18    7    pages    media
39903    7    2017-06-07 17:31:16    2017-06-07 17:31:16    512       0    0    1    212    ...    18    7    pages    media
39936    7    2017-06-20 18:40:26    2017-06-20 18:40:26    472576    0    0    1    212    ...    18    7    pages    media
39937    7    2017-06-20 18:40:26    2017-06-20 18:40:26    905728    0    0    1    212    ...    18    7    pages    media
39938    7    2017-06-20 18:40:26    2017-06-20 18:40:26    188160    0    0    1    212    ...    18    7    pages    media
39934    7    2017-06-20 18:40:26    2017-06-20 18:40:26    718080    0    0    1    212    ...    18    7    pages    media
39935    7    2017-06-20 18:40:26    2017-06-20 18:40:26    596992    0    0    1    212    ...    18    7    pages    media
39964    7    2017-07-07 08:34:50    2017-07-07 08:34:50    83456     0    0    1    212    ...    18    7    pages    media
39965    7    2017-07-07 08:34:50    2017-07-07 08:34:50    801024    0    0    1    212    ...    18    7    pages    media
39966    7    2017-07-07 08:34:50    2017-07-07 08:34:50    676608    0    0    1    212    ...    18    7    pages    media
39967    7    2017-07-07 08:34:50    2017-07-07 08:34:50    555520    0    0    1    212    ...    18    7    pages    media
39968    7    2017-07-07 08:34:50    2017-07-07 08:34:50    988672    0    0    1    212    ...    18    7    pages    media
39969    7    2017-07-07 08:34:50    2017-07-07 08:34:50    926464    0    0    1    212    ...    18    7    pages    media
39970    7    2017-07-07 08:34:50    2017-07-07 08:34:50    493312    0    0    1    212    ...    18    7    pages    media
39971    7    2017-07-07 08:34:50    2017-07-07 08:34:50    617728    0    0    1    212    ...    18    7    pages    media
39972    7    2017-07-07 08:34:50    2017-07-07 08:34:50    738816    0    0    1    212    ...    18    7    pages    media
39973    7    2017-07-07 08:34:50    2017-07-07 08:34:50    146688    0    0    1    212    ...    18    7    pages    media
39974    7    2017-07-07 08:34:50    2017-07-07 08:34:50    864256    0    0    1    212    ...    18    7    pages    media
39975    7    2017-07-07 08:34:50    2017-07-07 08:34:50    21248     0    0    1    212    ...    18    7    pages    media
39976    7    2017-07-07 08:34:50    2017-07-07 08:34:50    431104    0    0    1    212    ...    18    7    pages    media

#5 Updated by Frans Saris about 2 years ago

Interesting thing is that a lot of timestamps are the same and looking at the sorting field it looks like they are somehow all linked to the same field.

Is there some custom tca added there? Can you see what action triggers the creation of all the extra rows in the DB.

#6 Updated by Sigfried Arnold about 2 years ago

In most cases there was Grid Elements and DCE present, that was our fist guess - but we have a few occurrences without any additional fields.

So far we have not isolated the issue yet.

#7 Updated by Markus Opahle about 2 years ago

I don't know if this is related to this issue but I found a similar issue on several 8.7 instances. The only difference here is that the duplicates marked as deleted right after creation. This happens when you add l10n_mode=exclude to an inline relation to sys_file_references. The duplicates appear when you translate a record and they depend on the number of translations exist before (so you actually have to translate the record twice to reproduce the issue).
It seems that an inline relationship to a translatable table and with l10n_mode=exclude is not handled well by \TYPO3\CMS\Core\DataHandling\Localization\DataMapProcessor which creates the duplicates in my case.
Hope this helps

#8 Updated by Susanne Moog almost 2 years ago

  • Category set to DataHandler aka TCEmain

#9 Updated by Sigfried Arnold over 1 year ago

Just got noticed by a customer with a TYPO3 7.6 instance with multiple languages with this issue - this time a few hundred copies. The instance runs on Azure with Ubuntu and MariaDB - and has a different architecure from our other instances i've seen so far. Others usually used CentOS with MySQL or RHEL with MySQL.

Sadly this enviroment has no transaction log in MariaDB activated, so i cannot narrow down, what created those duplicates.

Is there any new information available on this issue?

#10 Updated by Riccardo De Contardi over 1 year ago

  • Related to Bug #80804: Apache precesses overload because of typo3temp/locks/flock_ added

#11 Updated by Oliver Hader 10 months ago

Having a sequence of recreated records (e.g. 20 deleted, one visible record, all having same information) might the result of a flow in l10n_mode=exclude in combination with inline records that has been addressed in issue #87640 - maybe give it a try in order to avoid these re-synchronization problems.

#12 Updated by Oliver Hader 10 months ago

  • Status changed from New to Needs Feedback

#13 Updated by Oliver Hader 10 months ago

  • Related to Bug #87640: Keep language reference for associated l10n_mode=exclude children added

#14 Updated by Sybille Peters 9 months ago

Oliver, the patch introduced for #87640 was only merged into TYPO3 9, but this bug here was reported for 8 and 7.

#15 Updated by Sybille Peters 9 months ago

Sigfried, I don't quite see how the same fieldname and pid occuring multiple times would automatically point to a problem in the query you suggested:

SELECT pid, fieldname, COUNT(*) as count FROM sys_file_reference GROUP BY pid,fieldname order by count DESC;

But this query would with a higher probability, I think: It checks if there are duplicates with the same relation:

SELECT pid, tablenames,uid_foreign,table_local,uid_local,COUNT(*) as count FROM sys_file_reference GROUP BY pid,tablenames,uid_foreign,table_local,uid_local order by count DESC;

I do get hits for that (but not such huge amounts, all less than 10).

Also available in: Atom PDF