Project

General

Profile

Actions

Bug #20612

open

scandinavian letters are translittered wrong

Added by Katja Lampela almost 15 years ago. Updated 5 months ago.

Status:
Needs Feedback
Priority:
Should have
Assignee:
-
Category:
Localization
Start date:
2009-06-12
Due date:
% Done:

0%

Estimated time:
TYPO3 Version:
4.2
PHP Version:
Tags:
Complexity:
Is Regression:
No
Sprint Focus:

Description

The scandinavian letters ä, ö and å are rendered, with for example realurl, in a wrong way. Ä is ae, ö is oe.. Ä should be a, ö o and å a.

This should be added to the file typo3_src-X/t3lib/unidata/Translit.txt:

  1. scandinavian
    00e4; 0061; LATIN SMALL LETTER A WITH UMLAUTS => a (finnish)
    00c4; 0041; LATIN CAPITAL LETTER A WITH UMLAUTS => A (finnish)
    00f6; 006f; LATIN SMALL LETTER O WITH UMLAUTS => o (finnish)
    00d6; 004f; LATIN CAPITAL LETTER O WITH UMLAUTS => O (finnish)
    00e5; 0061; LATIN SMALL LETTER SWEDISH A (Å) => a (finnish)
    00c5; 0041; LATIN CAPITAL LETTER SWEDISH A (Å) => a (finnish)
    (issue imported from #M11322)

Files

transliterate-norwegian.png (63.7 KB) transliterate-norwegian.png Norwegian transliteration Mathias Bolt Lesniak, 2020-09-10 14:49
transliterate-english.png (55.3 KB) transliterate-english.png English transliteration Mathias Bolt Lesniak, 2020-09-10 14:49
transliterate-german.png (63.9 KB) transliterate-german.png German transliteration Mathias Bolt Lesniak, 2020-09-10 14:49
transliterate-swedish.png (63.8 KB) transliterate-swedish.png Swedish transliteration Mathias Bolt Lesniak, 2020-09-10 14:49

Related issues 4 (0 open4 closed)

Related to TYPO3 Core - Bug #67187: recursiveFileListSortingHelper natural sorting isn't locale awareResolved2015-05-29

Actions
Related to TYPO3 Core - Task #83546: Unit test CharsetConverter::specCharsToASCII()Closed2018-01-12

Actions
Related to TYPO3 Core - Bug #93883: Transliteration of german umlauts fails partly on file upload for files created on macClosed2021-04-08

Actions
Has duplicate TYPO3 Core - Bug #83438: Respect suomi in specCharsToASCII conversion methodClosed2017-12-28

Actions
Actions #1

Updated by Martin Kutschker almost 15 years ago

Unfortunately the transliteration is currently language independent. The vowels with umlauts are transliterated according to German custom. Which is fine for the huge German user base.

As a workaround you may change Translit.txt and and delete all files in typo3temp/cs.

Actions #2

Updated by Alexander Opitz almost 11 years ago

  • Status changed from New to Needs Feedback
  • Target version deleted (0)
  • TYPO3 Version set to 4.2

The issue is very old, does this issue exists in newer versions of TYPO3 CMS (4.5 or 6.1)?

Actions #3

Updated by Katja Lampela almost 11 years ago

Yes it still exists in 4.5-4.7. Haven't tried 6, but I suspect nothing has been done there either.

Actions #4

Updated by Alexander Opitz almost 11 years ago

  • Status changed from Needs Feedback to New
Actions #5

Updated by Mathias Schreiber over 9 years ago

  • Target version set to 7.2 (Frontend)
  • Is Regression set to No
Actions #6

Updated by Riccardo De Contardi about 9 years ago

Still present in 6.2.11 I guess, the file /typo3/sysext/core/Resources/Private/Charsets/unidata/Translit.txt does not contain a "Scandinavian" section.

What about adding an option (in Install tool?) where to specify a custom file instead of modifying the original in the core?

Actions #7

Updated by Benni Mack almost 9 years ago

  • Target version changed from 7.2 (Frontend) to 7.4 (Backend)
Actions #8

Updated by Susanne Moog over 8 years ago

  • Target version changed from 7.4 (Backend) to 7.5
Actions #9

Updated by Joonas Kauhanen over 8 years ago

Riccardo De Contardi wrote:

Still present in 6.2.11 I guess, the file /typo3/sysext/core/Resources/Private/Charsets/unidata/Translit.txt does not contain a "Scandinavian" section.

What about adding an option (in Install tool?) where to specify a custom file instead of modifying the original in the core?

This would be very handy. Now we have to manually update the Translit.txt file every time after updating TYPO3 source. As many of the developers are from central Europe, they don't understand the issue. And I think it's not an issue strictly to scandinavians, as the transliterations should really be language specific, not system wide.

Hoping this to be part of 7 LTS

Actions #10

Updated by Benni Mack over 8 years ago

  • Target version changed from 7.5 to 7 LTS
Actions #11

Updated by Jan Helke over 8 years ago

  • Assignee set to Jan Helke
Actions #12

Updated by Gerrit Code Review over 8 years ago

  • Status changed from New to Under Review

Patch set 1 for branch master of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at http://review.typo3.org/44115

Actions #13

Updated by Christian Kuhn over 8 years ago

  • Status changed from Under Review to New
Actions #14

Updated by Riccardo De Contardi about 8 years ago

  • Target version changed from 7 LTS to Candidate for Major Version
Actions #15

Updated by Bart Lammers over 7 years ago

This still seems to be the case. For a client this is a big issue in regards to speaking URLs that have (in their eyes) wrong letters in them.
The solution as provided earlier by Jan Helke seems like a nice solution, using the Install Tool to override the default transliteration files.

Back then the reasoning to not accept this was to rework the CharsetConverter completely, but now all trough version 4.x, 6.2LTS, 7.6LTS and now 8.x this has not been addressed.

Is it possible to reconsider the given solution?

Actions #16

Updated by Benni Mack almost 7 years ago

Hey Bart,

maybe we can use a PHP library to do the work now that we cleaned up CharsetConverter big time, do you know any good places to look for?

Actions #17

Updated by Susanne Moog over 6 years ago

  • Category set to Localization
  • Assignee deleted (Jan Helke)
Actions #18

Updated by Markus Klein over 6 years ago

  • Has duplicate Bug #83438: Respect suomi in specCharsToASCII conversion method added
Actions #19

Updated by Christian Kuhn over 6 years ago

  • Related to Task #83546: Unit test CharsetConverter::specCharsToASCII() added
Actions #20

Updated by Benni Mack over 5 years ago

  • Status changed from New to Needs Feedback

Hey,

this issue should be fixed with 9 LTS and site handling. Please let us know if the new version will solve your issue, otherwise we'll close this ticket in the next weeks.

Benni.

Actions #21

Updated by Joonas Kauhanen over 5 years ago

We have now tested the new 9 LTS with Site Handling and Url Routing. Unfortunately the issue is still not completely resolved, but we are getting near!

Let me go through the issue by example:

- Editor creates a page in Finnish language, titled "Ääni" (that is "Sound" in Finnish)
- In the Page properties, TYPO3 shows generated URL Segment "/aeaeni"
- This is incorrect for Finnish (and Swedish, Danish, etc) language, so the editor must manually override the URL Segment as "/aani"

For pages, manually correcting all URL segments is manageable, but somehow irritating. However, if the URL contains content from other records such as News or an integrated Product database, there may not even be an opportunity to input correct URL segments and all the generated slugs are translittered wrong.

I have tracked the URL generation down to sysext/core/Classes/DataHandling/SlugHelper.php where the function sanitize is used to make slugs URL compatible. This function uses CharsetConverter->specCharsToASCII to convert extended letters to ASCII characters. No locale or language parameters are provided, and the CharsetConverter uses a hardcoded path to Resources/Private/Charsets/unidata/Translit.txt where the character replacements are defined. This file is only compatible with German way of spelling umlaut characters.

Do you have ideas how this could be improved so that we could somehow provide conversion tables for languages other than German? Essentially we still have to edit Resources/Private/Charsets/unidata/Translit.txt manually after updating TYPO3 and remember to clear typo3temp cache after that.

Actions #22

Updated by Susanne Moog about 5 years ago

Some research notes:

- PHP has a Transliterator in the `intl` extension, but we'd need to provide custom rules there, too.
- Our Symfony `intl` polyfill does not polyfill the Transliterator meaning we'd have to introduce `intl` as hard dependency
- There seems to be no decent PHP transliteration solution covering our use cases out of the box

May be possible:
- Provide the possibility to register own translit.txt file in localconf to allow integrators to use custom transliterations

Actions #23

Updated by Alexander Schnitzler over 4 years ago

Susanne Moog wrote:

Some research notes:

- PHP has a Transliterator in the `intl` extension, but we'd need to provide custom rules there, too.
- Our Symfony `intl` polyfill does not polyfill the Transliterator meaning we'd have to introduce `intl` as hard dependency
- There seems to be no decent PHP transliteration solution covering our use cases out of the box

May be possible:
- Provide the possibility to register own translit.txt file in localconf to allow integrators to use custom transliterations

Just a short notice that symfony/string, which has been released just a couple of days ago, does a really great job, normalizing and converting string, respecting locales properly:

(new AsciiSlugger('de'))->slug('Näe ja koe')->toString(); // Naee-ja-koe
(new AsciiSlugger('dk'))->slug('Näe ja koe')->toString(); // Nae-ja-koe

We should evaluate how much own logic we actually need if using this library.
Could replace our current slugger and parts of the CharsetConverter (maybe even all of it).

Updated by Mathias Bolt Lesniak over 3 years ago

Alexander Schnitzler wrote:

We should evaluate how much own logic we actually need if using this library.
Could replace our current slugger and parts of the CharsetConverter (maybe even all of it).

I made a first implementation the AsciiSlug class in symfony/string. It's working OK, but I notice TYPO3 doesn't transliterate e.g. Chinese and Hindi characters. AsciiSlug does.

This implementation uses the page language to determine how to transliterate the string, so & can be transliterated as "og" in Norwegian and "und" in German.

Implementation: https://github.com/pixelant/transliterator

Actions #25

Updated by Christoph Lehmann about 3 years ago

  • Related to Bug #93883: Transliteration of german umlauts fails partly on file upload for files created on mac added
Actions #26

Updated by Martin Kutschker about 3 years ago

Today IMHO all the transliteration is not needed at all. All the world is happy with Unicode.

Actions #27

Updated by Joonas Kauhanen about 3 years ago

Martin Kutschker wrote in #note-26:

Today IMHO all the transliteration is not needed at all. All the world is happy with Unicode.

That is a great solution, if we can use Unicode in URLs.
Modern browsers seem to support it, but to be sure the URLs should still probably be percent-encoded?

Actions #28

Updated by Mathias Bolt Lesniak over 2 years ago

Just for the sake of completeness, transliteration doesn't only apply to slugs and URLs in TYPO3. It also applies to file names, where UTF-8 encoding may still not be working.

Actions #29

Updated by Paul Hansen 5 months ago · Edited

After some research, it looks like modern browsers will seamlessly handle percent-encoded URLs for more thorough l10n.

Symfony supported this as of version 6.1 in 2022: https://symfony.com/blog/new-in-symfony-6-1-improved-routing-requirements-and-utf-8-parameters

Just look around Wikipedia and you'll see this in action. This doc provides some examples: https://developers.google.com/search/docs/crawling-indexing/url-structure

Actions #30

Updated by Mathias Bolt Lesniak 5 months ago

Paul Hansen wrote in #note-29:

After some research, it looks like modern browsers will seamlessly handle percent-encoded URLs for more thorough l10n.

Yes that is absolutely true. At the same time, it doesn't remove the need for transliteration — especially where typing is involved. "That accented letter isn't on my keyboard" or "how do I type 漢?"

Mathias

Actions

Also available in: Atom PDF