Project

General

Profile

Actions

Bug #93883

closed

Transliteration of german umlauts fails partly on file upload for files created on mac

Added by Christoph Lehmann about 3 years ago. Updated over 1 year ago.

Status:
Closed
Priority:
Should have
Assignee:
-
Category:
File Abstraction Layer (FAL)
Target version:
-
Start date:
2021-04-08
Due date:
% Done:

100%

Estimated time:
TYPO3 Version:
10
PHP Version:
Tags:
Complexity:
Is Regression:
Sprint Focus:

Description

How to reproduce on Mac:

(i) OK

touch a file named test-ö-ä-ü.txt with your favourite terminal and upload it in the file list.

Result: The file is renamed to test-oe-ae-ue.txt

The hex representation of öäü is

0000000 c3 a4 c3 b6 c3 bc

You get it with bin2hex()

(ii) Not OK

Save a file with TextEdit or simple create new file in Finder with the name test-ö-ü-ä.txt and upload it in the file list

Result: The file is renamed to test-o__u__a__.txt

The transliteration of öäü fails (or lets say is incomplete) when its representation is

0000000 61 cc 88 6f cc 88 75 cc 88


German umlauts have multiple representations in utf8 charset. One of them seems not handled correctly by \TYPO3\CMS\Core\Resource\Driver\LocalDriver::sanitizeFileName() or in \TYPO3\CMS\Core\Charset\CharsetConverter


Related issues 3 (2 open1 closed)

Related to TYPO3 Core - Bug #20612: scandinavian letters are translittered wrongNeeds Feedback2009-06-12

Actions
Related to TYPO3 Core - Bug #93764: SlugHelper can create bad urlsClosed2021-03-17

Actions
Related to TYPO3 Core - Feature #57695: Implement unicode normalization in TYPO3 Core's charset conversion routines, especially for filepaths in TYPO3 FAL's LocalDriver.Needs Feedback2014-04-06

Actions
Actions #1

Updated by Christoph Lehmann about 3 years ago

  • Related to Bug #20612: scandinavian letters are translittered wrong added
Actions #2

Updated by Christoph Lehmann about 3 years ago

A very simple solution for this issue is to use

\Normalizer::normalize();

in/before

\TYPO3\CMS\Core\Charset\CharsetConverter::specCharsToASCII()
Actions #3

Updated by Martin Kutschker about 3 years ago

Will fail if "intl" is not enabled, but that can be checked. Better use it when it's available then not use it at all.

Actions #5

Updated by Martin Kutschker about 3 years ago

A brute force removal of ALL nonspacing marks:

Transliterator::createFromRules('any-NFD; [\p{Mn}] Remove; any-NFC')->transliterate($subject);

https://www.compart.com/en/unicode/category/Mn

The character should probably changed to list only Latin combining marks.

Actions #6

Updated by Martin Kutschker almost 3 years ago

  • Related to Bug #93764: SlugHelper can create bad urls added
Actions #7

Updated by Gerrit Code Review almost 3 years ago

  • Status changed from New to Under Review

Patch set 1 for branch master of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/c/Packages/TYPO3.CMS/+/69144

Actions #8

Updated by Gerrit Code Review almost 3 years ago

Patch set 2 for branch master of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/c/Packages/TYPO3.CMS/+/69144

Actions #9

Updated by Gerrit Code Review almost 3 years ago

Patch set 3 for branch master of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/c/Packages/TYPO3.CMS/+/69144

Actions #10

Updated by Gerrit Code Review almost 3 years ago

Patch set 4 for branch master of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/c/Packages/TYPO3.CMS/+/69144

Actions #11

Updated by Anonymous almost 3 years ago

  • Status changed from Under Review to Resolved
  • % Done changed from 0 to 100
Actions #12

Updated by Gerrit Code Review over 2 years ago

  • Status changed from Resolved to Under Review

Patch set 1 for branch 10.4 of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/c/Packages/TYPO3.CMS/+/70255

Actions #13

Updated by Gerrit Code Review over 2 years ago

Patch set 2 for branch 10.4 of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/c/Packages/TYPO3.CMS/+/70255

Actions #14

Updated by Riccardo De Contardi over 1 year ago

  • Status changed from Under Review to Closed
  • Target version deleted (Candidate for patchlevel)

Closed as requested by the reporter;

If you think that this is the wrong decision, please reopen it or open a new issue with a reference to this one.

Thank you.

Actions #15

Updated by Benni Mack 8 months ago

  • Related to Feature #57695: Implement unicode normalization in TYPO3 Core's charset conversion routines, especially for filepaths in TYPO3 FAL's LocalDriver. added
Actions

Also available in: Atom PDF