Project

General

Profile

Actions

Feature #56726

closed

Trigger MetaDataExtraction after file upload

Added by Frans Saris almost 11 years ago. Updated about 6 years ago.

Status:
Closed
Priority:
Should have
Assignee:
-
Category:
File Abstraction Layer (FAL)
Target version:
-
Start date:
2014-03-10
Due date:
% Done:

100%

Estimated time:
PHP Version:
Tags:
Complexity:
easy
Sprint Focus:

Description

Currently the metadataExtraction is only called through scheduler task. So when a editor uploads a new file he has to wait until the scheduler task is triggered again.

For most file storage types it no problem to trigger the metadataExtraction direct after fileupload. Only in some special situations it isn't desirable to have to metadataExtraction direct after fileUpload/adding a file to the storage. To still support these use-cases a flag needs to be added to the storage so the integrator can disable the auto metadataExtraction for his special use-case.


Related issues 2 (0 open2 closed)

Has duplicate TYPO3 Core - Task #57546: Call ExtractionService on new files and not only Indexer::createIndexEntry()Closed2014-04-02

Actions
Is duplicate of TYPO3 Core - Bug #57408: Call of the meta extractor services for local storage possibleClosed2014-03-28

Actions
Actions #1

Updated by Steffen Ritter almost 11 years ago

  • Status changed from New to Needs Feedback

metadata vs. indexing

metadata extraction always should be asynchronously because it could be very heavy.

Actions #2

Updated by Frans Saris almost 11 years ago

  • Category set to File Abstraction Layer (FAL)

I know it could be heavy but I guess for 1 file at a time it should be not a problem.

Heavy services could provide a own check in canProcess() that they only be executed in cli context etc.

Actions #3

Updated by Steffen Ritter almost 11 years ago

Frans Saris wrote:

Heavy services could provide a own check in canProcess() that they only be executed in cli context etc.

no - that's exactly why this "processing" has been detached form indexing process (despite it was in the old indexer)

Actions #4

Updated by Alexander Opitz over 10 years ago

Hi,

what's the state of this issue?

Actions #5

Updated by Xavier Perseguers over 10 years ago

It was done on purpose, so this should not be changed.

If you really want to index right away, EXT:extractor lets you do that.

Actions #6

Updated by Fabien Udriot over 10 years ago

+1 for metadata extraction upon upload. The actual situation is not satisfying, IMO -> Users do not want to wait until the next cron run.

If there is the fear to overload the system, a threshold (number of files on upload) could be added where to disable the metadata extraction. However, I believe on the majority of cases that won't be a problem.

Actions #7

Updated by Xavier Perseguers over 10 years ago

Just to be complete here, automatic metadata extraction is not only a problem of overloading the system but it slows down the upload itself a lot in case you are relying on binaries, such as tika (Java-based). Test for yourself, you'll see.

Actions #8

Updated by Fabien Udriot over 10 years ago

(By overloading the system, I meant slowing down the upload <-- just for the sake of clarity.)

By far not everyone has Tika deployed which is reserved to some advance set-up. Furthermore PHP based metadata extraction, is quite fast to my experience.

Could we make it as an opt-out option: by default indexing after upload which can be disabled by some configuration. This would be a compromise. Again, as a User it looks unsatisfying to have to wait for the next cron cycle to get the metadata.

Actions #9

Updated by Ingo Renner over 10 years ago

FWIW: Tika can also be run in server mode, which then saves the start up time of the JVM and the making it a lot faster. It's just that EXT:tika does not support server mode (yet).

Actions #10

Updated by Frans Saris over 10 years ago

Maybe we can add a checbox to the storage settings to enable auto metadata extraction for that storage?

Actions #11

Updated by Alexander Opitz about 10 years ago

  • Status changed from Needs Feedback to New
Actions #12

Updated by Frans Saris over 9 years ago

  • Tracker changed from Bug to Feature
  • Subject changed from MetaDataExtraction isn't triggerd after file is uploaded to Trigger MetaDataExtraction after file upload
  • Description updated (diff)
Actions #13

Updated by Gerrit Code Review over 9 years ago

  • Status changed from New to Under Review

Patch set 5 for branch master of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at http://review.typo3.org/41800

Actions #14

Updated by Gerrit Code Review over 9 years ago

Patch set 1 for branch master of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at http://review.typo3.org/43059

Actions #15

Updated by Gerrit Code Review over 9 years ago

Patch set 2 for branch master of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at http://review.typo3.org/43059

Actions #16

Updated by Gerrit Code Review over 9 years ago

Patch set 3 for branch master of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at http://review.typo3.org/43059

Actions #17

Updated by Gerrit Code Review over 9 years ago

Patch set 4 for branch master of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at http://review.typo3.org/43059

Actions #18

Updated by Frans Saris over 9 years ago

  • Status changed from Under Review to Resolved
  • % Done changed from 0 to 100
Actions #19

Updated by Xavier Perseguers about 9 years ago

A tiny followup here in case someone is reading. EXT:extractor 1.0.0 now natively supports Tika server and in fact as Ingo suggested it, this is tremendously quicker than using the standalone application jar. Using external tools such as pdfinfo or exiftool is really quick as well, and as Fabien noticed, PHP-based extraction, although really poor in term of supported file formats is really fast as well.

Thanks for having implemented that in Core.

Actions #20

Updated by Benni Mack about 6 years ago

  • Status changed from Resolved to Closed
Actions

Also available in: Atom PDF