Feature #56726

Trigger MetaDataExtraction after file upload

Added by Frans Saris over 5 years ago. Updated about 1 year ago.

Status:
Closed
Priority:
Should have
Assignee:
-
Category:
File Abstraction Layer (FAL)
Target version:
-
Start date:
2014-03-10
Due date:
% Done:

100%

PHP Version:
Tags:
Complexity:
easy
Sprint Focus:

Description

Currently the metadataExtraction is only called through scheduler task. So when a editor uploads a new file he has to wait until the scheduler task is triggered again.

For most file storage types it no problem to trigger the metadataExtraction direct after fileupload. Only in some special situations it isn't desirable to have to metadataExtraction direct after fileUpload/adding a file to the storage. To still support these use-cases a flag needs to be added to the storage so the integrator can disable the auto metadataExtraction for his special use-case.


Related issues

Duplicated by TYPO3 Core - Task #57546: Call ExtractionService on new files and not only Indexer::createIndexEntry() Closed 2014-04-02
Duplicates TYPO3 Core - Bug #57408: Call of the meta extractor services for local storage possible Closed 2014-03-28
Precedes Metadata and content analysis service - Bug #70860: Metadata extraction may be called twice Closed 2015-10-20

Associated revisions

Revision 99de17ab (diff)
Added by Frans Saris about 4 years ago

[FEATURE] Trigger metadata extraction after file upload

Releases: master
Resolves: #56726
Change-Id: I8f08403aca72bc9ca3f37dec6f98bf016c79a9ee
Reviewed-on: http://review.typo3.org/43059
Reviewed-by: Wouter Wolters <>
Tested-by: Wouter Wolters <>
Reviewed-by: Georg Ringer <>
Tested-by: Georg Ringer <>

History

#1 Updated by Steffen Ritter over 5 years ago

  • Status changed from New to Needs Feedback

metadata vs. indexing

metadata extraction always should be asynchronously because it could be very heavy.

#2 Updated by Frans Saris over 5 years ago

  • Category set to File Abstraction Layer (FAL)

I know it could be heavy but I guess for 1 file at a time it should be not a problem.

Heavy services could provide a own check in canProcess() that they only be executed in cli context etc.

#3 Updated by Steffen Ritter over 5 years ago

Frans Saris wrote:

Heavy services could provide a own check in canProcess() that they only be executed in cli context etc.

no - that's exactly why this "processing" has been detached form indexing process (despite it was in the old indexer)

#4 Updated by Alexander Opitz over 5 years ago

Hi,

what's the state of this issue?

#5 Updated by Xavier Perseguers over 5 years ago

It was done on purpose, so this should not be changed.

If you really want to index right away, EXT:extractor lets you do that.

#6 Updated by Fabien Udriot over 5 years ago

+1 for metadata extraction upon upload. The actual situation is not satisfying, IMO -> Users do not want to wait until the next cron run.

If there is the fear to overload the system, a threshold (number of files on upload) could be added where to disable the metadata extraction. However, I believe on the majority of cases that won't be a problem.

#7 Updated by Xavier Perseguers over 5 years ago

Just to be complete here, automatic metadata extraction is not only a problem of overloading the system but it slows down the upload itself a lot in case you are relying on binaries, such as tika (Java-based). Test for yourself, you'll see.

#8 Updated by Fabien Udriot about 5 years ago

(By overloading the system, I meant slowing down the upload <-- just for the sake of clarity.)

By far not everyone has Tika deployed which is reserved to some advance set-up. Furthermore PHP based metadata extraction, is quite fast to my experience.

Could we make it as an opt-out option: by default indexing after upload which can be disabled by some configuration. This would be a compromise. Again, as a User it looks unsatisfying to have to wait for the next cron cycle to get the metadata.

#9 Updated by Ingo Renner about 5 years ago

FWIW: Tika can also be run in server mode, which then saves the start up time of the JVM and the making it a lot faster. It's just that EXT:tika does not support server mode (yet).

#10 Updated by Frans Saris about 5 years ago

Maybe we can add a checbox to the storage settings to enable auto metadata extraction for that storage?

#11 Updated by Alexander Opitz almost 5 years ago

  • Status changed from Needs Feedback to New

#12 Updated by Frans Saris about 4 years ago

  • Tracker changed from Bug to Feature
  • Subject changed from MetaDataExtraction isn't triggerd after file is uploaded to Trigger MetaDataExtraction after file upload
  • Description updated (diff)

#13 Updated by Gerrit Code Review about 4 years ago

  • Status changed from New to Under Review

Patch set 5 for branch master of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at http://review.typo3.org/41800

#14 Updated by Gerrit Code Review about 4 years ago

Patch set 1 for branch master of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at http://review.typo3.org/43059

#15 Updated by Gerrit Code Review about 4 years ago

Patch set 2 for branch master of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at http://review.typo3.org/43059

#16 Updated by Gerrit Code Review about 4 years ago

Patch set 3 for branch master of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at http://review.typo3.org/43059

#17 Updated by Gerrit Code Review about 4 years ago

Patch set 4 for branch master of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at http://review.typo3.org/43059

#18 Updated by Frans Saris about 4 years ago

  • Status changed from Under Review to Resolved
  • % Done changed from 0 to 100

#19 Updated by Xavier Perseguers almost 4 years ago

A tiny followup here in case someone is reading. EXT:extractor 1.0.0 now natively supports Tika server and in fact as Ingo suggested it, this is tremendously quicker than using the standalone application jar. Using external tools such as pdfinfo or exiftool is really quick as well, and as Fabien noticed, PHP-based extraction, although really poor in term of supported file formats is really fast as well.

Thanks for having implemented that in Core.

#20 Updated by Benni Mack about 1 year ago

  • Status changed from Resolved to Closed

Also available in: Atom PDF