Project

General

Profile

Actions

Bug #80085

closed

Extraction of metadata in PDF-documents does not recognize unicode characters

Added by Gerhard Rupp over 7 years ago. Updated almost 2 years ago.

Status:
Closed
Priority:
Must have
Assignee:
-
Category:
Indexed Search
Target version:
-
Start date:
2017-03-01
Due date:
% Done:

100%

Estimated time:
TYPO3 Version:
8
PHP Version:
7.0
Tags:
Complexity:
Is Regression:
No
Sprint Focus:

Description

If metadata in a PDF-document does f. e. contain German umlauts field value is cut-off.

Therefore in "FileContentParser.php" in function "splitPdfInfo" line 796 (TYPO3 7.6)

$res[strtolower(trim($parts[0]))] = trim($parts[1]);

has to be replaced by

$res[strtolower(trim($parts[0]))] = utf8_encode(trim($parts[1]));

Related issues 1 (0 open1 closed)

Related to TYPO3 Core - Bug #99352: PDF Metadata double-encoded by index-search indexer with poppler-utils pdfinfoClosed2022-12-13

Actions
Actions

Also available in: Atom PDF