Trigger MetaDataExtraction after file upload
Currently the metadataExtraction is only called through scheduler task. So when a editor uploads a new file he has to wait until the scheduler task is triggered again.
For most file storage types it no problem to trigger the metadataExtraction direct after fileupload. Only in some special situations it isn't desirable to have to metadataExtraction direct after fileUpload/adding a file to the storage. To still support these use-cases a flag needs to be added to the storage so the integrator can disable the auto metadataExtraction for his special use-case.
[FEATURE] Trigger metadata extraction after file upload
Reviewed-by: Wouter Wolters <firstname.lastname@example.org>
Tested-by: Wouter Wolters <email@example.com>
Reviewed-by: Georg Ringer <firstname.lastname@example.org>
Tested-by: Georg Ringer <email@example.com>
#6 Updated by Fabien Udriot over 6 years ago
+1 for metadata extraction upon upload. The actual situation is not satisfying, IMO -> Users do not want to wait until the next cron run.
If there is the fear to overload the system, a threshold (number of files on upload) could be added where to disable the metadata extraction. However, I believe on the majority of cases that won't be a problem.
#8 Updated by Fabien Udriot over 6 years ago
(By overloading the system, I meant slowing down the upload <-- just for the sake of clarity.)
By far not everyone has Tika deployed which is reserved to some advance set-up. Furthermore PHP based metadata extraction, is quite fast to my experience.
Could we make it as an opt-out option: by default indexing after upload which can be disabled by some configuration. This would be a compromise. Again, as a User it looks unsatisfying to have to wait for the next cron cycle to get the metadata.
#19 Updated by Xavier Perseguers about 5 years ago
A tiny followup here in case someone is reading. EXT:extractor 1.0.0 now natively supports Tika server and in fact as Ingo suggested it, this is tremendously quicker than using the standalone application jar. Using external tools such as pdfinfo or exiftool is really quick as well, and as Fabien noticed, PHP-based extraction, although really poor in term of supported file formats is really fast as well.
Thanks for having implemented that in Core.