Project

General

Profile

Actions

Bug #87295

closed

Chinese Language url not working with TYPO3 9.5

Added by bharat parmar almost 6 years ago. Updated over 5 years ago.

Status:
Closed
Priority:
Must have
Assignee:
-
Category:
Site Handling, Site Sets & Routing
Start date:
2018-12-26
Due date:
% Done:

100%

Estimated time:
TYPO3 Version:
9
PHP Version:
7.2
Tags:
slug
Complexity:
Is Regression:
Sprint Focus:

Description

I have site with Chinese language and TYPO3 9.5 but Chinese page title in url are replaced with some string like e5b7a5e4bd9ce4b88ee8818ce4b89a

In realurl version its /cn/应用/ and in new(TYPO3 9 slug) its /cn/e5ba94e794a8

So is Chinese supported in TYPO3 9 slug?

According to doc https://docs.typo3.org/typo3cms/extensions/core/Changelog/9.4/Feature-84729-NewTCATypeSlug.html unicode is supported.

Its seems
line 126 $slug = rawurlencode($slug); \\ \TYPO3\CMS\Core\DataHandling\SlugHelper->sanitize() encoded it.


Files

slug.png (182 KB) slug.png bharat parmar, 2018-12-26 06:47
Actions #1

Updated by Riccardo De Contardi almost 6 years ago

  • Category set to Site Handling, Site Sets & Routing
Actions #2

Updated by Ricky Mathew almost 6 years ago

  • Priority changed from Should have to Must have
  • Target version set to Candidate for patchlevel

Any updates?

Actions #3

Updated by Lars Peter Søndergaard almost 6 years ago

The rawurlencode line can probably be removed.

A few lines earlier there is the regular expression:

$slug = preg_replace('/[^\p{L}0-9\/' . preg_quote($fallbackCharacter) . ']/u', '', $slug);

That line removes anything that is not a Letter (using unicode properties), not a digit (0-9), not a forward-slash and not the fallback character.

Letters within ASCII are only a-z and A-Z. Anything else has a codepoint beyond U+80, outside of ASCII and those characters should not cause trouble for URLs. None of them are used as URL or HTML delimiters, as far as I know.

Letters, digits and slashes are safe in the path segments. Only the fallback-character might be unsafe if defined unusual.

The only characters that could cause trouble in the path segment of a URL, is the '?', '&', '"' and "'" (for HTML).
Those characters however, are properly encoded by the symfony UrlGenerator class:

Symfony\Component\Routing\Generator\UrlGenerator::doGenerate

It uses rawurlencode when building the url, but decodes a set of characters predefined as protected $decodedChars.

I only followed the case where a Site configuration is available, so I wouldn't know what happens in "traditional" setups, or if those slugs are even relevant in that case.

Greetings.

Actions #4

Updated by Ricky Mathew almost 6 years ago

I also think there is no need for rawurlencode() as the preg_replace() before handles eveything perfectly.if rawurlencode() can't be avoided then i have a workaround of using rawurldecode() at the end of sanitize() function if sanitize() is referenced only on purpose of url generation.
What all other thoughts?I think this must be considered in the next patch level as typo3 url system isn't supporting Asian languages at all !!.

Actions #5

Updated by Sven Juergens almost 6 years ago

here same problem with Arabic language

Actions #6

Updated by Gerrit Code Review over 5 years ago

  • Status changed from New to Under Review

Patch set 1 for branch master of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/c/Packages/TYPO3.CMS/+/59796

Actions #7

Updated by Gerrit Code Review over 5 years ago

Patch set 2 for branch master of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/c/Packages/TYPO3.CMS/+/59796

Actions #8

Updated by Gerrit Code Review over 5 years ago

Patch set 1 for branch 9.5 of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/c/Packages/TYPO3.CMS/+/59822

Actions #9

Updated by Guido Schmechel over 5 years ago

  • Status changed from Under Review to Resolved
  • % Done changed from 0 to 100
Actions #10

Updated by Benni Mack over 5 years ago

  • Status changed from Resolved to Closed
Actions

Also available in: Atom PDF