Feature #36743

Use text extraction services to get file content

Added by Ingo Renner over 9 years ago. Updated almost 4 years ago.

Status:
Closed
Priority:
Must have
Assignee:
Category:
File Abstraction Layer (FAL)
Target version:
Start date:
2012-05-01
Due date:
% Done:

100%

Estimated time:
PHP Version:
Tags:
Complexity:
Sprint Focus:

Description

Currently FAL simply uses file_get_contents() in its local driver to extract a file's content. This is fine for simple text files, but won't work for file types like Office and PDF files.

TYPO3 already offers the services infrastructure to allow having different text extractors. Use the textExtract service to read file contents.

#1

Updated by Gerrit Code Review over 9 years ago

  • Status changed from New to Under Review

Patch set 1 for branch master has been pushed to the review server.
It is available at http://review.typo3.org/10916

#2

Updated by Gerrit Code Review over 9 years ago

Patch set 2 for branch master has been pushed to the review server.
It is available at http://review.typo3.org/10916

#3

Updated by Alexander Opitz about 8 years ago

  • Assignee changed from Ingo Renner to Andreas Wolf
  • Target version deleted (6.0.0)

What is the state of text extraction services? You metioned in gerrit that there are other plans to implement this.

#4

Updated by Alexander Opitz over 6 years ago

  • Target version set to 7.1 (Cleanup)
  • Sprint Focus set to On Location Sprint
#5

Updated by Alexander Opitz over 6 years ago

  • Category set to File Abstraction Layer (FAL)
#6

Updated by Frans Saris over 6 years ago

  • Status changed from Under Review to Needs Feedback

You can create your own extractor service to process a file to get the readable content of a file just like is possible for metadata.

In you extractor you call $file->getForLocalProcessing(); to get the path to the real file (or temp local copy of it) and do your magic to fetch the text.

#7

Updated by Fabien Udriot over 6 years ago

A source of inspiration could be in EXT:metadata where we retrieve custom metadata for images and pdf.

Can we close the ticket?

#8

Updated by Mathias Schreiber over 6 years ago

  • Status changed from Needs Feedback to Accepted
#9

Updated by Gerrit Code Review over 6 years ago

  • Status changed from Accepted to Under Review

Patch set 1 for branch master of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at http://review.typo3.org/36556

#10

Updated by Gerrit Code Review over 6 years ago

Patch set 2 for branch master of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at http://review.typo3.org/36556

#11

Updated by Gerrit Code Review over 6 years ago

Patch set 3 for branch master of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at http://review.typo3.org/36556

#12

Updated by Gerrit Code Review over 6 years ago

Patch set 4 for branch master of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at http://review.typo3.org/36556

#13

Updated by Gerrit Code Review over 6 years ago

Patch set 5 for branch master of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at http://review.typo3.org/36556

#14

Updated by Gerrit Code Review over 6 years ago

Patch set 6 for branch master of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at http://review.typo3.org/36556

#15

Updated by Gerrit Code Review over 6 years ago

Patch set 7 for branch master of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at http://review.typo3.org/36556

#16

Updated by Gerrit Code Review over 6 years ago

Patch set 8 for branch master of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at http://review.typo3.org/36556

#17

Updated by Ingo Renner over 6 years ago

  • Status changed from Under Review to Resolved
  • % Done changed from 0 to 100
#18

Updated by Anja Leichsenring over 5 years ago

  • Sprint Focus deleted (On Location Sprint)
#19

Updated by Riccardo De Contardi almost 4 years ago

  • Status changed from Resolved to Closed

Also available in: Atom PDF