Feature #56726: Trigger MetaDataExtraction after file upload - TYPO3 Core - TYPO3 Forge

Actions

Copy link

Feature #56726

closed

Trigger MetaDataExtraction after file upload

Added by Frans Saris over 10 years ago. Updated almost 6 years ago.

Status:

Closed

Priority:

Should have

Assignee:

Category:

File Abstraction Layer (FAL)

Target version:

Start date:

2014-03-10

Due date:

% Done:

100%

Estimated time:

PHP Version:

Tags:

Complexity:

easy

Sprint Focus:

Description

Currently the metadataExtraction is only called through scheduler task. So when a editor uploads a new file he has to wait until the scheduler task is triggered again.

For most file storage types it no problem to trigger the metadataExtraction direct after fileupload. Only in some special situations it isn't desirable to have to metadataExtraction direct after fileUpload/adding a file to the storage. To still support these use-cases a flag needs to be added to the storage so the integrator can disable the auto metadataExtraction for his special use-case.

Related issues 2 (0 open — 2 closed)

Actions

Copy link

Updated by Steffen Ritter over 10 years ago

Status changed from New to Needs Feedback

metadata vs. indexing

metadata extraction always should be asynchronously because it could be very heavy.

Actions

Copy link

Updated by Frans Saris over 10 years ago

Category set to File Abstraction Layer (FAL)

I know it could be heavy but I guess for 1 file at a time it should be not a problem.

Heavy services could provide a own check in canProcess() that they only be executed in cli context etc.

Actions

Copy link

Updated by Steffen Ritter over 10 years ago

Frans Saris wrote:

Heavy services could provide a own check in canProcess() that they only be executed in cli context etc.

no - that's exactly why this "processing" has been detached form indexing process (despite it was in the old indexer)

Actions

Copy link

Updated by Alexander Opitz about 10 years ago

Hi,

what's the state of this issue?

Actions

Copy link

Updated by Xavier Perseguers about 10 years ago

It was done on purpose, so this should not be changed.

If you really want to index right away, EXT:extractor lets you do that.

Actions

Copy link

Updated by Fabien Udriot about 10 years ago

+1 for metadata extraction upon upload. The actual situation is not satisfying, IMO -> Users do not want to wait until the next cron run.

If there is the fear to overload the system, a threshold (number of files on upload) could be added where to disable the metadata extraction. However, I believe on the majority of cases that won't be a problem.

Actions

Copy link

Updated by Xavier Perseguers about 10 years ago

Just to be complete here, automatic metadata extraction is not only a problem of overloading the system but it slows down the upload itself a lot in case you are relying on binaries, such as tika (Java-based). Test for yourself, you'll see.

Actions

Copy link

Updated by Fabien Udriot about 10 years ago

(By overloading the system, I meant slowing down the upload <-- just for the sake of clarity.)

By far not everyone has Tika deployed which is reserved to some advance set-up. Furthermore PHP based metadata extraction, is quite fast to my experience.

Could we make it as an opt-out option: by default indexing after upload which can be disabled by some configuration. This would be a compromise. Again, as a User it looks unsatisfying to have to wait for the next cron cycle to get the metadata.

Actions

Copy link

Updated by Ingo Renner about 10 years ago

FWIW: Tika can also be run in server mode, which then saves the start up time of the JVM and the making it a lot faster. It's just that EXT:tika does not support server mode (yet).

Actions

Copy link

#10

Updated by Frans Saris almost 10 years ago

Maybe we can add a checbox to the storage settings to enable auto metadata extraction for that storage?

Actions

Copy link

#11

Updated by Alexander Opitz over 9 years ago

Status changed from Needs Feedback to New

Actions

Copy link

#12

Updated by Frans Saris almost 9 years ago

Tracker changed from Bug to Feature
Subject changed from MetaDataExtraction isn't triggerd after file is uploaded to Trigger MetaDataExtraction after file upload
Description updated (diff)

Actions

Copy link

#13

Updated by Gerrit Code Review almost 9 years ago

Status changed from New to Under Review

Patch set 5 for branch master of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at http://review.typo3.org/41800

Actions

Copy link

#14

Updated by Gerrit Code Review almost 9 years ago

Patch set 1 for branch master of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at http://review.typo3.org/43059

Actions

Copy link

#15

Updated by Gerrit Code Review almost 9 years ago

Patch set 2 for branch master of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at http://review.typo3.org/43059

Actions

Copy link

#16

Updated by Gerrit Code Review almost 9 years ago

Patch set 3 for branch master of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at http://review.typo3.org/43059

Actions

Copy link

#17

Updated by Gerrit Code Review almost 9 years ago

Patch set 4 for branch master of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at http://review.typo3.org/43059

Actions

Copy link

#18

Updated by Frans Saris almost 9 years ago

Status changed from Under Review to Resolved
% Done changed from 0 to 100

Applied in changeset 99de17ab9db52d349a3e1eff62d5ffe5bb3577fb.

Actions

Copy link

#19

Updated by Xavier Perseguers almost 9 years ago

A tiny followup here in case someone is reading. EXT:extractor 1.0.0 now natively supports Tika server and in fact as Ingo suggested it, this is tremendously quicker than using the standalone application jar. Using external tools such as pdfinfo or exiftool is really quick as well, and as Fabien noticed, PHP-based extraction, although really poor in term of supported file formats is really fast as well.

Thanks for having implemented that in Core.

Actions

Copy link

#20

Updated by Benni Mack almost 6 years ago

Status changed from Resolved to Closed

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

TYPO3 Core

Custom queries

Watchers (4)

Feature #56726

Trigger MetaDataExtraction after file upload

Updated by Steffen Ritter over 10 years ago

Updated by Frans Saris over 10 years ago

Updated by Steffen Ritter over 10 years ago

Updated by Alexander Opitz about 10 years ago

Updated by Xavier Perseguers about 10 years ago

Updated by Fabien Udriot about 10 years ago

Updated by Xavier Perseguers about 10 years ago

Updated by Fabien Udriot about 10 years ago

Updated by Ingo Renner about 10 years ago

Updated by Frans Saris almost 10 years ago

Updated by Alexander Opitz over 9 years ago

Updated by Frans Saris almost 9 years ago

Updated by Gerrit Code Review almost 9 years ago

Updated by Gerrit Code Review almost 9 years ago

Updated by Gerrit Code Review almost 9 years ago

Updated by Gerrit Code Review almost 9 years ago

Updated by Gerrit Code Review almost 9 years ago

Updated by Frans Saris almost 9 years ago

Updated by Xavier Perseguers almost 9 years ago

Updated by Benni Mack almost 6 years ago