Bug #20612
openscandinavian letters are transliterated wrong
0%
Description
The scandinavian letters ä, ö and å are rendered, with for example realurl, in a wrong way. Ä is ae, ö is oe.. Ä should be a, ö o and å a.
This should be added to the file typo3_src-X/t3lib/unidata/Translit.txt:
- scandinavian
00e4; 0061; LATIN SMALL LETTER A WITH UMLAUTS => a (finnish)
00c4; 0041; LATIN CAPITAL LETTER A WITH UMLAUTS => A (finnish)
00f6; 006f; LATIN SMALL LETTER O WITH UMLAUTS => o (finnish)
00d6; 004f; LATIN CAPITAL LETTER O WITH UMLAUTS => O (finnish)
00e5; 0061; LATIN SMALL LETTER SWEDISH A (Å) => a (finnish)
00c5; 0041; LATIN CAPITAL LETTER SWEDISH A (Å) => a (finnish)
(issue imported from #M11322)
Files
Updated by Martin Kutschker over 15 years ago
Unfortunately the transliteration is currently language independent. The vowels with umlauts are transliterated according to German custom. Which is fine for the huge German user base.
As a workaround you may change Translit.txt and and delete all files in typo3temp/cs.
Updated by Alexander Opitz over 11 years ago
- Status changed from New to Needs Feedback
- Target version deleted (
0) - TYPO3 Version set to 4.2
The issue is very old, does this issue exists in newer versions of TYPO3 CMS (4.5 or 6.1)?
Updated by Katja Lampela over 11 years ago
Yes it still exists in 4.5-4.7. Haven't tried 6, but I suspect nothing has been done there either.
Updated by Alexander Opitz over 11 years ago
- Status changed from Needs Feedback to New
Updated by Mathias Schreiber almost 10 years ago
- Target version set to 7.2 (Frontend)
- Is Regression set to No
Updated by Riccardo De Contardi over 9 years ago
Still present in 6.2.11 I guess, the file /typo3/sysext/core/Resources/Private/Charsets/unidata/Translit.txt does not contain a "Scandinavian" section.
What about adding an option (in Install tool?) where to specify a custom file instead of modifying the original in the core?
Updated by Benni Mack over 9 years ago
- Target version changed from 7.2 (Frontend) to 7.4 (Backend)
Updated by Susanne Moog over 9 years ago
- Target version changed from 7.4 (Backend) to 7.5
Updated by Joonas Kauhanen over 9 years ago
Riccardo De Contardi wrote:
Still present in 6.2.11 I guess, the file /typo3/sysext/core/Resources/Private/Charsets/unidata/Translit.txt does not contain a "Scandinavian" section.
What about adding an option (in Install tool?) where to specify a custom file instead of modifying the original in the core?
This would be very handy. Now we have to manually update the Translit.txt file every time after updating TYPO3 source. As many of the developers are from central Europe, they don't understand the issue. And I think it's not an issue strictly to scandinavians, as the transliterations should really be language specific, not system wide.
Hoping this to be part of 7 LTS
Updated by Benni Mack about 9 years ago
- Target version changed from 7.5 to 7 LTS
Updated by Gerrit Code Review about 9 years ago
- Status changed from New to Under Review
Patch set 1 for branch master of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at http://review.typo3.org/44115
Updated by Christian Kuhn about 9 years ago
- Status changed from Under Review to New
Updated by Riccardo De Contardi over 8 years ago
- Target version changed from 7 LTS to Candidate for Major Version
Updated by Bart Lammers about 8 years ago
This still seems to be the case. For a client this is a big issue in regards to speaking URLs that have (in their eyes) wrong letters in them.
The solution as provided earlier by Jan Helke seems like a nice solution, using the Install Tool to override the default transliteration files.
Back then the reasoning to not accept this was to rework the CharsetConverter completely, but now all trough version 4.x, 6.2LTS, 7.6LTS and now 8.x this has not been addressed.
Is it possible to reconsider the given solution?
Updated by Benni Mack over 7 years ago
Hey Bart,
maybe we can use a PHP library to do the work now that we cleaned up CharsetConverter big time, do you know any good places to look for?
Updated by Susanne Moog about 7 years ago
- Category set to Localization
- Assignee deleted (
Jan Helke)
Updated by Markus Klein almost 7 years ago
- Has duplicate Bug #83438: Respect suomi in specCharsToASCII conversion method added
Updated by Christian Kuhn almost 7 years ago
- Related to Task #83546: Unit test CharsetConverter::specCharsToASCII() added
Updated by Benni Mack about 6 years ago
- Status changed from New to Needs Feedback
Hey,
this issue should be fixed with 9 LTS and site handling. Please let us know if the new version will solve your issue, otherwise we'll close this ticket in the next weeks.
Benni.
Updated by Joonas Kauhanen about 6 years ago
We have now tested the new 9 LTS with Site Handling and Url Routing. Unfortunately the issue is still not completely resolved, but we are getting near!
Let me go through the issue by example:
- Editor creates a page in Finnish language, titled "Ääni" (that is "Sound" in Finnish)
- In the Page properties, TYPO3 shows generated URL Segment "/aeaeni"
- This is incorrect for Finnish (and Swedish, Danish, etc) language, so the editor must manually override the URL Segment as "/aani"
For pages, manually correcting all URL segments is manageable, but somehow irritating. However, if the URL contains content from other records such as News or an integrated Product database, there may not even be an opportunity to input correct URL segments and all the generated slugs are translittered wrong.
I have tracked the URL generation down to sysext/core/Classes/DataHandling/SlugHelper.php
where the function sanitize
is used to make slugs URL compatible. This function uses CharsetConverter->specCharsToASCII
to convert extended letters to ASCII characters. No locale or language parameters are provided, and the CharsetConverter
uses a hardcoded path to Resources/Private/Charsets/unidata/Translit.txt
where the character replacements are defined. This file is only compatible with German way of spelling umlaut characters.
Do you have ideas how this could be improved so that we could somehow provide conversion tables for languages other than German? Essentially we still have to edit Resources/Private/Charsets/unidata/Translit.txt
manually after updating TYPO3 and remember to clear typo3temp
cache after that.
Updated by Susanne Moog almost 6 years ago
Some research notes:
- PHP has a Transliterator in the `intl` extension, but we'd need to provide custom rules there, too.
- Our Symfony `intl` polyfill does not polyfill the Transliterator meaning we'd have to introduce `intl` as hard dependency
- There seems to be no decent PHP transliteration solution covering our use cases out of the box
May be possible:
- Provide the possibility to register own translit.txt file in localconf to allow integrators to use custom transliterations
Updated by Alexander Schnitzler almost 5 years ago
Susanne Moog wrote:
Some research notes:
- PHP has a Transliterator in the `intl` extension, but we'd need to provide custom rules there, too.
- Our Symfony `intl` polyfill does not polyfill the Transliterator meaning we'd have to introduce `intl` as hard dependency
- There seems to be no decent PHP transliteration solution covering our use cases out of the boxMay be possible:
- Provide the possibility to register own translit.txt file in localconf to allow integrators to use custom transliterations
Just a short notice that symfony/string, which has been released just a couple of days ago, does a really great job, normalizing and converting string, respecting locales properly:
(new AsciiSlugger('de'))->slug('Näe ja koe')->toString(); // Naee-ja-koe
(new AsciiSlugger('dk'))->slug('Näe ja koe')->toString(); // Nae-ja-koe
We should evaluate how much own logic we actually need if using this library.
Could replace our current slugger and parts of the CharsetConverter (maybe even all of it).
Updated by Mathias Bolt Lesniak about 4 years ago
- File transliterate-norwegian.png transliterate-norwegian.png added
- File transliterate-english.png transliterate-english.png added
- File transliterate-german.png transliterate-german.png added
- File transliterate-swedish.png transliterate-swedish.png added
Alexander Schnitzler wrote:
We should evaluate how much own logic we actually need if using this library.
Could replace our current slugger and parts of the CharsetConverter (maybe even all of it).
I made a first implementation the AsciiSlug class in symfony/string. It's working OK, but I notice TYPO3 doesn't transliterate e.g. Chinese and Hindi characters. AsciiSlug does.
This implementation uses the page language to determine how to transliterate the string, so & can be transliterated as "og" in Norwegian and "und" in German.
Implementation: https://github.com/pixelant/transliterator
Updated by Christoph Lehmann over 3 years ago
- Related to Bug #93883: Transliteration of german umlauts fails partly on file upload for files created on mac added
Updated by Martin Kutschker over 3 years ago
Today IMHO all the transliteration is not needed at all. All the world is happy with Unicode.
Updated by Joonas Kauhanen over 3 years ago
Martin Kutschker wrote in #note-26:
Today IMHO all the transliteration is not needed at all. All the world is happy with Unicode.
That is a great solution, if we can use Unicode in URLs.
Modern browsers seem to support it, but to be sure the URLs should still probably be percent-encoded?
Updated by Mathias Bolt Lesniak over 3 years ago
Just for the sake of completeness, transliteration doesn't only apply to slugs and URLs in TYPO3. It also applies to file names, where UTF-8 encoding may still not be working.
Updated by Paul Hansen about 1 year ago · Edited
After some research, it looks like modern browsers will seamlessly handle percent-encoded URLs for more thorough l10n.
Symfony supported this as of version 6.1 in 2022: https://symfony.com/blog/new-in-symfony-6-1-improved-routing-requirements-and-utf-8-parameters
Just look around Wikipedia and you'll see this in action. This doc provides some examples: https://developers.google.com/search/docs/crawling-indexing/url-structure
Updated by Mathias Bolt Lesniak about 1 year ago
Paul Hansen wrote in #note-29:
After some research, it looks like modern browsers will seamlessly handle percent-encoded URLs for more thorough l10n.
Yes that is absolutely true. At the same time, it doesn't remove the need for transliteration — especially where typing is involved. "That accented letter isn't on my keyboard" or "how do I type 漢?"
Mathias
Updated by Garvin Hicking 5 months ago
- Subject changed from scandinavian letters are translittered wrong to scandinavian letters are transliterated wrong
- Tags set to translit,transliteration,transliterate,slugs,utf8,latin
Updated by Gerrit Code Review 4 months ago
- Status changed from Needs Feedback to Under Review
Patch set 1 for branch main of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/c/Packages/TYPO3.CMS/+/85574
Updated by Gerrit Code Review 4 months ago
Patch set 2 for branch main of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/c/Packages/TYPO3.CMS/+/85574
Updated by Gerrit Code Review 4 months ago
Patch set 3 for branch main of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/c/Packages/TYPO3.CMS/+/85574
Updated by Gerrit Code Review 4 months ago
Patch set 4 for branch main of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/c/Packages/TYPO3.CMS/+/85574
Updated by Gerrit Code Review 4 months ago
Patch set 5 for branch main of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/c/Packages/TYPO3.CMS/+/85574
Updated by Gerrit Code Review 4 months ago
Patch set 6 for branch main of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/c/Packages/TYPO3.CMS/+/85574
Updated by Gerrit Code Review 4 months ago
Patch set 7 for branch main of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/c/Packages/TYPO3.CMS/+/85574
Updated by Gerrit Code Review 4 months ago
Patch set 8 for branch main of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/c/Packages/TYPO3.CMS/+/85574
Updated by Gerrit Code Review 4 months ago
Patch set 9 for branch main of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/c/Packages/TYPO3.CMS/+/85574
Updated by Gerrit Code Review 4 months ago
Patch set 10 for branch main of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/c/Packages/TYPO3.CMS/+/85574
Updated by Gerrit Code Review 3 months ago
Patch set 11 for branch main of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/c/Packages/TYPO3.CMS/+/85574
Updated by Gerrit Code Review 3 months ago
Patch set 12 for branch main of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/c/Packages/TYPO3.CMS/+/85574
Updated by Gerrit Code Review 3 months ago
Patch set 13 for branch main of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/c/Packages/TYPO3.CMS/+/85574