Bug #99352
closedPDF Metadata double-encoded by index-search indexer with poppler-utils pdfinfo
100%
Description
pdfinfo version 21.08.0 Copyright 2005-2021 The Poppler Developers - http://poppler.freedesktop.org Copyright 1996-2011 Glyph & Cog, LLC
There are different versions of pdfinfo available in the wild.
Debian/Fedora use pdfinfo (>v20) from the poppler-utils package.
Also good hosters like Hetzner use this version.
This tool defaults to UTF-8 output for metadata:
pdfinfo umlauts-metadata.pdf | grep Title Title: Test æ ø å ü ö ä
On the other hand there are hosters like Mittwald and Domainfactory, which use the older v3 of pdfinfo which defaults to Latin1 output.
pdfinfo -v pdfinfo version 3.02 Copyright 1996-2007 Glyph & Cog, LLC
This tool produces Latin1 output by default:
pdfinfo umlauts-metadata.pdf | grep Title Title: Test � � � � � �
Both versions support an -enc UTF-8
option, which should be used by TYPO3 to circumvent the differences between these tools, instead of always implying that v3 is used and forcefully converting from ISO-8859-1 to UTF_8 – as added in See https://review.typo3.org/c/Packages/TYPO3.CMS/+/76861
– which leads to double-encoding with the poppler-utils pdfinfo variant.
Files
Updated by Benjamin Franzke almost 2 years ago
- Related to Bug #80085: Extraction of metadata in PDF-documents does not recognize unicode characters added
Updated by Gerrit Code Review almost 2 years ago
- Status changed from New to Under Review
Patch set 1 for branch main of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/c/Packages/TYPO3.CMS/+/77074
Updated by Gerrit Code Review almost 2 years ago
Patch set 2 for branch main of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/c/Packages/TYPO3.CMS/+/77074
Updated by Gerrit Code Review almost 2 years ago
Patch set 1 for branch 11.5 of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/c/Packages/TYPO3.CMS/+/77081
Updated by Gerrit Code Review almost 2 years ago
Patch set 1 for branch 10.4 of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/c/Packages/TYPO3.CMS/+/77082
Updated by Benjamin Franzke almost 2 years ago
- Status changed from Under Review to Resolved
- % Done changed from 0 to 100
Applied in changeset 0196f562a165b94ba54d3079a427ce72238a21e5.
Updated by Gerrit Code Review almost 2 years ago
- Status changed from Resolved to Under Review
Patch set 1 for branch 12.1 of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/c/Packages/TYPO3.CMS/+/77113
Updated by Gerrit Code Review almost 2 years ago
Patch set 2 for branch 12.1 of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/c/Packages/TYPO3.CMS/+/77113
Updated by Gerrit Code Review almost 2 years ago
Patch set 3 for branch 12.1 of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/c/Packages/TYPO3.CMS/+/77113
Updated by Benjamin Franzke almost 2 years ago
- Status changed from Under Review to Resolved
Applied in changeset a909535cda035016bd97d524d13f013006799c86.
Updated by Benni Mack almost 2 years ago
- Status changed from Resolved to Closed