Bug #80085
closed
Extraction of metadata in PDF-documents does not recognize unicode characters
Added by Gerhard Rupp over 7 years ago.
Updated almost 2 years ago.
Description
If metadata in a PDF-document does f. e. contain German umlauts field value is cut-off.
Therefore in "FileContentParser.php" in function "splitPdfInfo" line 796 (TYPO3 7.6)
$res[strtolower(trim($parts[0]))] = trim($parts[1]);
has to be replaced by
$res[strtolower(trim($parts[0]))] = utf8_encode(trim($parts[1]));
- % Done changed from 0 to 100
- Status changed from New to Under Review
- Status changed from Under Review to Needs Feedback
patch is not available.
Does the issue still exist?
- Status changed from Needs Feedback to New
- % Done changed from 100 to 0
Can anyone tell me what´s the problem about this trivial patch? It´s so frustrating that it takes ages even for such obvious und already proven fixes to be recognized by the core team. This makes updating TYPO3 unneccessary time consuming.
- TYPO3 Version changed from 7 to 8
After nearly two years this issue hasn´t been fixed even for recent versions (8.7, 9.5). Although a (obviously) working patch set was published long time ago. Frustrating ...
- Status changed from New to Under Review
- Status changed from Under Review to Resolved
- % Done changed from 0 to 100
- Status changed from Resolved to Closed
- Related to Bug #99352: PDF Metadata double-encoded by index-search indexer with poppler-utils pdfinfo added
Also available in: Atom
PDF