Bug #20612
scandinavian letters are translittered wrong
0%
Description
The scandinavian letters ä, ö and å are rendered, with for example realurl, in a wrong way. Ä is ae, ö is oe.. Ä should be a, ö o and å a.
This should be added to the file typo3_src-X/t3lib/unidata/Translit.txt:
- scandinavian
00e4; 0061; LATIN SMALL LETTER A WITH UMLAUTS => a (finnish)
00c4; 0041; LATIN CAPITAL LETTER A WITH UMLAUTS => A (finnish)
00f6; 006f; LATIN SMALL LETTER O WITH UMLAUTS => o (finnish)
00d6; 004f; LATIN CAPITAL LETTER O WITH UMLAUTS => O (finnish)
00e5; 0061; LATIN SMALL LETTER SWEDISH A (Å) => a (finnish)
00c5; 0041; LATIN CAPITAL LETTER SWEDISH A (Å) => a (finnish)
(issue imported from #M11322)
Files
Related issues
Updated by Martin Kutschker over 11 years ago
Unfortunately the transliteration is currently language independent. The vowels with umlauts are transliterated according to German custom. Which is fine for the huge German user base.
As a workaround you may change Translit.txt and and delete all files in typo3temp/cs.
Updated by Alexander Opitz almost 8 years ago
- Status changed from New to Needs Feedback
- Target version deleted (
0) - TYPO3 Version set to 4.2
The issue is very old, does this issue exists in newer versions of TYPO3 CMS (4.5 or 6.1)?
Updated by Katja Lampela almost 8 years ago
Yes it still exists in 4.5-4.7. Haven't tried 6, but I suspect nothing has been done there either.
Updated by Mathias Schreiber about 6 years ago
- Target version set to 7.2 (Frontend)
- Is Regression set to No
Updated by Riccardo De Contardi almost 6 years ago
Still present in 6.2.11 I guess, the file /typo3/sysext/core/Resources/Private/Charsets/unidata/Translit.txt does not contain a "Scandinavian" section.
What about adding an option (in Install tool?) where to specify a custom file instead of modifying the original in the core?
Updated by Benni Mack over 5 years ago
- Target version changed from 7.2 (Frontend) to 7.4 (Backend)
Updated by Joonas Kauhanen over 5 years ago
Riccardo De Contardi wrote:
Still present in 6.2.11 I guess, the file /typo3/sysext/core/Resources/Private/Charsets/unidata/Translit.txt does not contain a "Scandinavian" section.
What about adding an option (in Install tool?) where to specify a custom file instead of modifying the original in the core?
This would be very handy. Now we have to manually update the Translit.txt file every time after updating TYPO3 source. As many of the developers are from central Europe, they don't understand the issue. And I think it's not an issue strictly to scandinavians, as the transliterations should really be language specific, not system wide.
Hoping this to be part of 7 LTS
Updated by Gerrit Code Review over 5 years ago
- Status changed from New to Under Review
Patch set 1 for branch master of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at http://review.typo3.org/44115
Updated by Riccardo De Contardi almost 5 years ago
- Target version changed from 7 LTS to Candidate for Major Version
Updated by Bart Lammers over 4 years ago
This still seems to be the case. For a client this is a big issue in regards to speaking URLs that have (in their eyes) wrong letters in them.
The solution as provided earlier by Jan Helke seems like a nice solution, using the Install Tool to override the default transliteration files.
Back then the reasoning to not accept this was to rework the CharsetConverter completely, but now all trough version 4.x, 6.2LTS, 7.6LTS and now 8.x this has not been addressed.
Is it possible to reconsider the given solution?
Updated by Benni Mack almost 4 years ago
Hey Bart,
maybe we can use a PHP library to do the work now that we cleaned up CharsetConverter big time, do you know any good places to look for?
Updated by Susanne Moog over 3 years ago
- Category set to Localization
- Assignee deleted (
Jan Helke)
Updated by Markus Klein about 3 years ago
- Has duplicate Bug #83438: Respect suomi in specCharsToASCII conversion method added
Updated by Christian Kuhn about 3 years ago
- Related to Task #83546: Unit test CharsetConverter::specCharsToASCII() added
Updated by Benni Mack over 2 years ago
- Status changed from New to Needs Feedback
Hey,
this issue should be fixed with 9 LTS and site handling. Please let us know if the new version will solve your issue, otherwise we'll close this ticket in the next weeks.
Benni.
Updated by Joonas Kauhanen over 2 years ago
We have now tested the new 9 LTS with Site Handling and Url Routing. Unfortunately the issue is still not completely resolved, but we are getting near!
Let me go through the issue by example:
- Editor creates a page in Finnish language, titled "Ääni" (that is "Sound" in Finnish)
- In the Page properties, TYPO3 shows generated URL Segment "/aeaeni"
- This is incorrect for Finnish (and Swedish, Danish, etc) language, so the editor must manually override the URL Segment as "/aani"
For pages, manually correcting all URL segments is manageable, but somehow irritating. However, if the URL contains content from other records such as News or an integrated Product database, there may not even be an opportunity to input correct URL segments and all the generated slugs are translittered wrong.
I have tracked the URL generation down to sysext/core/Classes/DataHandling/SlugHelper.php
where the function sanitize
is used to make slugs URL compatible. This function uses CharsetConverter->specCharsToASCII
to convert extended letters to ASCII characters. No locale or language parameters are provided, and the CharsetConverter
uses a hardcoded path to Resources/Private/Charsets/unidata/Translit.txt
where the character replacements are defined. This file is only compatible with German way of spelling umlaut characters.
Do you have ideas how this could be improved so that we could somehow provide conversion tables for languages other than German? Essentially we still have to edit Resources/Private/Charsets/unidata/Translit.txt
manually after updating TYPO3 and remember to clear typo3temp
cache after that.
Updated by Susanne Moog almost 2 years ago
Some research notes:
- PHP has a Transliterator in the `intl` extension, but we'd need to provide custom rules there, too.
- Our Symfony `intl` polyfill does not polyfill the Transliterator meaning we'd have to introduce `intl` as hard dependency
- There seems to be no decent PHP transliteration solution covering our use cases out of the box
May be possible:
- Provide the possibility to register own translit.txt file in localconf to allow integrators to use custom transliterations
Updated by Alexander Schnitzler about 1 year ago
Susanne Moog wrote:
Some research notes:
- PHP has a Transliterator in the `intl` extension, but we'd need to provide custom rules there, too.
- Our Symfony `intl` polyfill does not polyfill the Transliterator meaning we'd have to introduce `intl` as hard dependency
- There seems to be no decent PHP transliteration solution covering our use cases out of the boxMay be possible:
- Provide the possibility to register own translit.txt file in localconf to allow integrators to use custom transliterations
Just a short notice that symfony/string, which has been released just a couple of days ago, does a really great job, normalizing and converting string, respecting locales properly:
(new AsciiSlugger('de'))->slug('Näe ja koe')->toString(); // Naee-ja-koe
(new AsciiSlugger('dk'))->slug('Näe ja koe')->toString(); // Nae-ja-koe
We should evaluate how much own logic we actually need if using this library.
Could replace our current slugger and parts of the CharsetConverter (maybe even all of it).
Updated by Mathias Bolt Lesniak 6 months ago
- File transliterate-norwegian.png transliterate-norwegian.png added
- File transliterate-english.png transliterate-english.png added
- File transliterate-german.png transliterate-german.png added
- File transliterate-swedish.png transliterate-swedish.png added
Alexander Schnitzler wrote:
We should evaluate how much own logic we actually need if using this library.
Could replace our current slugger and parts of the CharsetConverter (maybe even all of it).
I made a first implementation the AsciiSlug class in symfony/string. It's working OK, but I notice TYPO3 doesn't transliterate e.g. Chinese and Hindi characters. AsciiSlug does.
This implementation uses the page language to determine how to transliterate the string, so & can be transliterated as "og" in Norwegian and "und" in German.
Implementation: https://github.com/pixelant/transliterator