Bug #87295

Chinese Language url not working with TYPO3 9.5

Added by bharat parmar over 2 years ago. Updated over 2 years ago.

Status:
Closed
Priority:
Must have
Assignee:
-
Category:
Link Handling, Site Handling & Routing
Start date:
2018-12-26
Due date:
% Done:

100%

Estimated time:
TYPO3 Version:
9
PHP Version:
7.2
Tags:
slug
Complexity:
Is Regression:
Sprint Focus:

Description

I have site with Chinese language and TYPO3 9.5 but Chinese page title in url are replaced with some string like e5b7a5e4bd9ce4b88ee8818ce4b89a

In realurl version its /cn/应用/ and in new(TYPO3 9 slug) its /cn/e5ba94e794a8

So is Chinese supported in TYPO3 9 slug?

According to doc https://docs.typo3.org/typo3cms/extensions/core/Changelog/9.4/Feature-84729-NewTCATypeSlug.html unicode is supported.

Its seems
line 126 $slug = rawurlencode($slug); \\ \TYPO3\CMS\Core\DataHandling\SlugHelper->sanitize() encoded it.


Files

slug.png (182 KB) slug.png bharat parmar, 2018-12-26 06:47
#1

Updated by Riccardo De Contardi over 2 years ago

  • Category set to Link Handling, Site Handling & Routing
#2

Updated by Ricky Mathew over 2 years ago

  • Priority changed from Should have to Must have
  • Target version set to Candidate for patchlevel

Any updates?

#3

Updated by Lars Peter Søndergaard over 2 years ago

The rawurlencode line can probably be removed.

A few lines earlier there is the regular expression:

$slug = preg_replace('/[^\p{L}0-9\/' . preg_quote($fallbackCharacter) . ']/u', '', $slug);

That line removes anything that is not a Letter (using unicode properties), not a digit (0-9), not a forward-slash and not the fallback character.

Letters within ASCII are only a-z and A-Z. Anything else has a codepoint beyond U+80, outside of ASCII and those characters should not cause trouble for URLs. None of them are used as URL or HTML delimiters, as far as I know.

Letters, digits and slashes are safe in the path segments. Only the fallback-character might be unsafe if defined unusual.

The only characters that could cause trouble in the path segment of a URL, is the '?', '&', '"' and "'" (for HTML).
Those characters however, are properly encoded by the symfony UrlGenerator class:

Symfony\Component\Routing\Generator\UrlGenerator::doGenerate

It uses rawurlencode when building the url, but decodes a set of characters predefined as protected $decodedChars.

I only followed the case where a Site configuration is available, so I wouldn't know what happens in "traditional" setups, or if those slugs are even relevant in that case.

Greetings.

#4

Updated by Ricky Mathew over 2 years ago

I also think there is no need for rawurlencode() as the preg_replace() before handles eveything perfectly.if rawurlencode() can't be avoided then i have a workaround of using rawurldecode() at the end of sanitize() function if sanitize() is referenced only on purpose of url generation.
What all other thoughts?I think this must be considered in the next patch level as typo3 url system isn't supporting Asian languages at all !!.

#5

Updated by Sven Juergens over 2 years ago

here same problem with Arabic language

#6

Updated by Gerrit Code Review over 2 years ago

  • Status changed from New to Under Review

Patch set 1 for branch master of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/c/Packages/TYPO3.CMS/+/59796

#7

Updated by Gerrit Code Review over 2 years ago

Patch set 2 for branch master of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/c/Packages/TYPO3.CMS/+/59796

#8

Updated by Gerrit Code Review over 2 years ago

Patch set 1 for branch 9.5 of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/c/Packages/TYPO3.CMS/+/59822

#9

Updated by Guido Schmechel over 2 years ago

  • Status changed from Under Review to Resolved
  • % Done changed from 0 to 100
#10

Updated by Benni Mack over 2 years ago

  • Status changed from Resolved to Closed

Also available in: Atom PDF