http://forge.typo3.org/http://forge.typo3.org/themes/typo3_forge/favicon/favicon.png?17058661692020-04-22T11:32:43ZTYPO3 ForgeTYPO3 Core - Bug #91168: Design flaw in FAL with indexing of files after a movehttp://forge.typo3.org/issues/91168?journal_id=4237752020-04-22T11:32:43ZXavier Perseguersxavier@typo3.org
<ul></ul><p>This is just a (quick) suggestion:</p>
<pre>
diff --git a/typo3/sysext/core/Classes/Resource/Service/ExtractorService.php b/typo3/sysext/core/Classes/Resource/Service/ExtractorService.php
index 92229d3581..a4a05a75dc 100644
--- a/typo3/sysext/core/Classes/Resource/Service/ExtractorService.php
+++ b/typo3/sysext/core/Classes/Resource/Service/ExtractorService.php
@@ -80,6 +80,16 @@ class ExtractorService
*/
private function isFileTypeSupportedByExtractor(File $file, ExtractorInterface $extractor): bool
{
+ if ($file->getMetaData()->count() > 0) {
+ // There's a design flaw in FAL, moving a file should not reindex it
+ // because it will effectively lead to overriding any user-defined
+ // value by metadata extracted by any extractor again.
+ // We will do our small job to refuse to extract metadata if metadata
+ // are already present.
+ // See: https://forge.typo3.org/issues/91168
+ return false;
+ }
+
$isSupported = true;
$fileTypeRestrictions = $extractor->getFileTypeRestrictions();
if (!empty($fileTypeRestrictions) && !in_array($file->getType(), $fileTypeRestrictions, true)) {
</pre>
<p>Something that could logically be moved even more near the actual call (currently trying to find a way to prevent this flaw in my ext until TYPO3 is fixed).</p> TYPO3 Core - Bug #91168: Design flaw in FAL with indexing of files after a movehttp://forge.typo3.org/issues/91168?journal_id=4944582023-06-14T06:49:55ZSybille Peterssypets@gmx.de
<ul></ul><p>Hi Xavier, I am testing your extension and have noticed this issue (again).</p>
<p>Would you consider creating a patch for your suggested change?</p>
<p>I think metadata extraction on moving files does not make sense at all.</p>
<p>(I have not checked in later versions if this will still happen, maybe we should check that first).</p> TYPO3 Core - Bug #91168: Design flaw in FAL with indexing of files after a movehttp://forge.typo3.org/issues/91168?journal_id=4944592023-06-14T06:58:36ZSybille Peterssypets@gmx.de
<ul></ul><p>In the extension, this is now implemented like this, comparing tstamp and crdate in sys_file_metadata:</p>
<pre><code class="php syntaxhl" data-language="php"><span class="k">public</span> <span class="k">function</span> <span class="n">canProcess</span><span class="p">(</span><span class="kt">File</span> <span class="nv">$file</span><span class="p">)</span>
<span class="p">{</span>
<span class="nv">$typo3Branch</span> <span class="o">=</span> <span class="nb">class_exists</span><span class="p">(</span><span class="nc">\TYPO3\CMS\Core\Information\Typo3Version</span><span class="o">::</span><span class="n">class</span><span class="p">)</span>
<span class="o">?</span> <span class="p">(</span><span class="k">new</span> <span class="nc">\TYPO3\CMS\Core\Information\Typo3Version</span><span class="p">())</span><span class="o">-></span><span class="nf">getBranch</span><span class="p">()</span>
<span class="o">:</span> <span class="n">TYPO3_branch</span><span class="p">;</span>
<span class="k">if</span> <span class="p">(</span><span class="nb">version_compare</span><span class="p">(</span><span class="nv">$typo3Branch</span><span class="p">,</span> <span class="s1">'10.0'</span><span class="p">,</span> <span class="s1">'<'</span><span class="p">))</span> <span class="p">{</span>
<span class="nv">$metadata</span> <span class="o">=</span> <span class="nv">$file</span><span class="o">-></span><span class="nf">_getMetaData</span><span class="p">();</span>
<span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
<span class="nv">$metadata</span> <span class="o">=</span> <span class="nv">$file</span><span class="o">-></span><span class="nf">getMetaData</span><span class="p">();</span>
<span class="p">}</span>
<span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="k">empty</span><span class="p">(</span><span class="nv">$metadata</span><span class="p">)</span> <span class="o">&&</span> <span class="nv">$metadata</span><span class="p">[</span><span class="s1">'crdate'</span><span class="p">]</span> <span class="o">!==</span> <span class="nv">$metadata</span><span class="p">[</span><span class="s1">'tstamp'</span><span class="p">])</span> <span class="p">{</span>
<span class="c1">// There's a design flaw in FAL, moving a file should not reindex it</span>
<span class="c1">// because it will effectively lead to overriding any user-defined</span>
<span class="c1">// value by metadata extracted by any extractor again.</span>
<span class="c1">// We will do our small job to refuse to extract metadata if metadata</span>
<span class="c1">// modification timestamp is different from creation timestamp</span>
<span class="c1">// See: https://forge.typo3.org/issues/91168</span>
<span class="k">return</span> <span class="kc">false</span><span class="p">;</span>
<span class="p">}</span>
</code></pre>
<p><a href="https://github.com/xperseguers/t3ext-extractor/blob/353b262ec94e29a8b3bbc8cde33665ed4715db1e/Classes/Service/Extraction/AbstractExtractionService.php" class="external">link to latest commit in master branch</a></p> TYPO3 Core - Bug #91168: Design flaw in FAL with indexing of files after a movehttp://forge.typo3.org/issues/91168?journal_id=4949712023-06-19T10:50:08ZSybille Peterssypets@gmx.de
<ul></ul><p>I am not sure this is a design flaw. I think what happened is that probably at the beginning the metadata was in sys_file too and then this was separated and the extractors added and the file index (sys_file) and metadata (sys_file_metadata) has not been entirely and cleanly separated.</p>
<p>This is visible in the core and documentation.</p>
<p>I think to fix is good not to fix or workaround this one case but decide when sys_file should be updated and metadata extracted and separate these 2 from each other where it makes sense. Also, it is not so easy to fix and I suggest to write tests first to make sure not to break anything.</p>
<p>...</p>
<p>Clearly separated:</p>
<ul>
<li>2 separate scheduler tasks</li>
<li>automatic metadata extraction can be turned off in storage</li>
</ul>
<p>Not clearly separated:</p>
<ul>
<li>in core there is a function Indexer::updateIndexEntry, this updates the sys_file index and starts metdata extraction. In some cases it is good to do both (e.g. when a new file is uploaded or a file copied), in some cases it does not make sense to do metadata extraction (again), e.g. on file move, but you do need the sys_file index updating.</li>
</ul>
<pre>
public function updateIndexEntry(File $fileObject): File
{
$updatedInformation = $this->gatherFileInformationArray($fileObject->getIdentifier());
$fileObject->updateProperties($updatedInformation);
$this->getFileIndexRepository()->update($fileObject);
$metaData = $this->extractRequiredMetaData($fileObject);
if ($this->storage->autoExtractMetadataEnabled()) {
$metaData = array_merge($metaData, $this->getExtractorService()->extractMetaData($fileObject));
}
$fileObject->getMetaData()->add($metaData)->save();
return $fileObject;
}
</pre>
<p>documentation:</p>
<ul>
<li>in the documentation, in some places the distinction between the file index (sys_file) and metadata extraction is not clear</li>
</ul>
<p>e.g.</p>
<blockquote>
<p>Indexing a file creates a database record for the file, containing meta-information both about the file (file-system properties) and from the file (e.g. EXIF information for images).</p>
</blockquote>
<p><a class="external" href="https://docs.typo3.org/m/typo3/reference-coreapi/12.4/en-us/ApiOverview/Fal/Architecture/Components.html#the-file-index">https://docs.typo3.org/m/typo3/reference-coreapi/12.4/en-us/ApiOverview/Fal/Architecture/Components.html#the-file-index</a></p>
<p>That is kind of correct if with "indexing" you mean updating sys_file <strong>and</strong> extracting metadata and with "database record" you mean sys_file and sys_file_metadata. In any case, I find it misleading.</p>
<p>Otherwise, it is not correct, because the exif metadata is written to sys_file_metdata</p>
<ul>
<li>also, the metadata extraction is not documented at all (e.g. how to add your own extractors)</li>
</ul>