Bug #91168

Design flaw in FAL with indexing of files after a move

Added by Xavier Perseguers 9 months ago. Updated 9 months ago.

Status:
New
Priority:
Must have
Assignee:
-
Category:
File Abstraction Layer (FAL)
Target version:
Start date:
2020-04-22
Due date:
% Done:

0%

Estimated time:
TYPO3 Version:
6.2
PHP Version:
Tags:
Complexity:
Is Regression:
Sprint Focus:

Description

This problem was reported by myself in the context of the TYPO3 extension extractor: https://github.com/xperseguers/t3ext-extractor/issues/25

Context

When moving a file in fileadmin, FAL is invoking extractors to extract metadata again which is totally wrong, it should be done during upload (if configured so for the underlying storage) but never afterwards.

Problem is that if a user changes the metadata by editing them, they will get overridden whenever the file is moved around!

\TYPO3\CMS\Core\Resource\ResourceStorage::moveFile() line 2020 is doing that:

$this->getIndexer()->updateIndexEntry($file);

which in turn does that:

$metaData = $this->extractRequiredMetaData($fileObject);

if ($this->storage->autoExtractMetadataEnabled()) {
    $metaData = array_merge($metaData, $this->getExtractorService()->extractMetaData($fileObject));
}
$fileObject->getMetaData()->add($metaData)->save();

This has the effect of possibly overriding changes made by the user and leads to loss of information!

How to reproduce: Start

  • Take a vanilla v6.2, v7, v8, v9 or v10 TYPO3 installation
  • Download and install EXT:extractor (https://extensions.typo3.org/extension/extractor or packagist causal/extractor)
  • Configure the extension by just enabling native PHP extraction (default settings, nothing to change, no need to configure additional external extraction services)
  • Upload a file containing metadata. As an example I attach a sample PDF (sample.pdf) coming from EXT:extractor itself to this ticket
  • Click the file name to edit the metadata, you get something like that:

  • Now change some metadata manually, like the title or the description. Save
  • You can edit again to check, your changes have naturally been persisted.

How to reproduce: DESIGN FLAW!

  • Move the file to another directory
  • Edit the metadata... BANG! your metadata have been reset to the content of the 3rd-party extractor services

Solution

  • Check if reindexing the file after moving it is actually required by FAL
  • In any case, DO NOT INVOKE metadata extraction again!

Files

sample.pdf (23.5 KB) sample.pdf Xavier Perseguers, 2020-04-22 10:59
metadata.png (114 KB) metadata.png Xavier Perseguers, 2020-04-22 11:02
#1

Updated by Xavier Perseguers 9 months ago

This is just a (quick) suggestion:

diff --git a/typo3/sysext/core/Classes/Resource/Service/ExtractorService.php b/typo3/sysext/core/Classes/Resource/Service/ExtractorService.php
index 92229d3581..a4a05a75dc 100644
--- a/typo3/sysext/core/Classes/Resource/Service/ExtractorService.php
+++ b/typo3/sysext/core/Classes/Resource/Service/ExtractorService.php
@@ -80,6 +80,16 @@ class ExtractorService
      */
     private function isFileTypeSupportedByExtractor(File $file, ExtractorInterface $extractor): bool
     {
+        if ($file->getMetaData()->count() > 0) {
+            // There's a design flaw in FAL, moving a file should not reindex it
+            // because it will effectively lead to overriding any user-defined
+            // value by metadata extracted by any extractor again.
+            // We will do our small job to refuse to extract metadata if metadata
+            // are already present.
+            // See: https://forge.typo3.org/issues/91168
+            return false;
+        }
+
         $isSupported = true;
         $fileTypeRestrictions = $extractor->getFileTypeRestrictions();
         if (!empty($fileTypeRestrictions) && !in_array($file->getType(), $fileTypeRestrictions, true)) {

Something that could logically be moved even more near the actual call (currently trying to find a way to prevent this flaw in my ext until TYPO3 is fixed).

Also available in: Atom PDF