Project

General

Profile

Actions

Task #56691

closed

Rework FAL indexer registration mechanism

Added by Xavier Perseguers about 10 years ago. Updated about 10 years ago.

Status:
Closed
Priority:
Should have
Assignee:
-
Category:
File Abstraction Layer (FAL)
Target version:
Start date:
2014-03-09
Due date:
% Done:

0%

Estimated time:
TYPO3 Version:
6.2
PHP Version:
5.4
Tags:
Complexity:
Sprint Focus:

Description

FAL has introduced a mechanism to register indexers. This is typically intended to allow 3rd party extensions to populate FAL metadata.

However this is not compatible with our concept of "services" available since ages in TYPO3 and used by indexed_search to extract text and metadata.

DAM (let's remember that it is FAL's father in some way) back then relied on this mechanism to extract metadata from files.

Instead of keeping our reinvented wheel, we should get rid of it ASAP and use TYPO3 services instead.

TYPO3 services are described here: http://docs.typo3.org/typo3cms/Typo3ServicesReference/ and metadata extraction has already been implemented for many file format in the famous cc_meta* extensions (then replaced by svmetaextract) as well as (at least) tika.

Actions #1

Updated by Steffen Ritter about 10 years ago

I highly object! Will veto as long as I'm involved with FAL.

  1. Services do not allow to ensure an interface
  2. Services are not "fal-aware" - os checks are not important, driver based checks are
  3. Services do all the registration/configuration stuff on public TYPO3_CONF_VARS arrays
  4. Even if using the services for meta data extraction the DAM metadata extractors are not compatible/usable for FAL, they expect DAM-Objects
  5. .....
Actions #2

Updated by Jigal van Hemert about 10 years ago

1. The services might need an overhaul, but this can be done later
2. Not necessary IMO. If FAL needs data from a file with whatever method of extracting the information the file needs to be on the local file system, even temporarily. If FAL wants to use PHP functions to get information from a file it needs to be downloaded too (if it's not already locally available). Using existing services for the actual extraction of information makes use of existing extensions.
3. There is a lot that still uses that array. Maybe something for a future change, but not enough reason to re-invent the wheel.
4. The extraction services have nothing to do with DAM. EXT:solr uses these services without DAM being installed (so no "DAM-objects" on the system). EXT:tika provides such extraction services for EXT:solr, but it works also with EXT:dam and any other extension that uses the extraction services.

In short: the moment FAL uses a local (copy of the) file to extract information from, the existing extraction services should be used. There are extensions available which provide metadata / text extraction services; some of them support more file types than most people know.

I will refrain from making any comments about the threat to veto the use of an existing API.

Actions #3

Updated by Frans Saris about 10 years ago

The current services that exist can not directly be used as they all depend on the fact that the file is present on the local filesystem.
Yes we could copy all files from remote storage to local but that would not make sense on large file sets.

We want/need to pass the file object to the extracting service and not only the file path.
So to have a generic way in core to process files we need a different service than the already existing (in 3th party extension not in core) one.

Maybe we should create a wrapper that uses the existing metaExtract services but i'm not sure this should be part of core.

Actions #4

Updated by Jigal van Hemert about 10 years ago

So far I've understood that there are two slightly different, but related concepts.

FAL extractor.
Supplied by the driver. Can get metadata for a file in its storage.

Core metadata extraction service.
Implemented by extensions. Can get metadata from a physical file.

Many installations will mainly use the Local storage and driver. This is the only driver which is shipped by the core.
Given that,
- there are many possible file types to get information from (which often lack proper documentation)
- it's a gigantic work to support all these file types
- there are existing extensions which supply this feature through the core metadata extraction service
it would be logical that the Local driver uses such extensions if they are available.
The core could supply a very basic (low priority) metadata extraction service to read the width/height of a few image types (as is needed so desperately by the core). If a more advanced service extension is available it will automatically supply the information for the file types it supports.

Other FAL drivers, for example for remote storages, have to implement their own extractors. If they cannot get the metadata information from the remote source they could decide to download the file and use the core metadata extraction service to get the infromation; this is up to the author of the driver.

Olivier Dobberkau has already offered to sponsor the feature of supporting extensions which supply (core) metadata extraction services.

Actions #5

Updated by Steffen Ritter about 10 years ago

  • Status changed from New to Needs Feedback

Can this be closed?

Actions #6

Updated by Xavier Perseguers about 10 years ago

  • Status changed from Needs Feedback to Closed

Yes, after trying, it is indeed not possible to (even "somehow") cleanly use TYPO3 services. So I created this bridge extension: http://typo3.org/extensions/repository/view/extractor

Actions

Also available in: Atom PDF