Project

General

Profile

Actions

Bug #15704

closed

Getting strange chars when fetching UTF-8 strings from locallang.xml

Added by John Angel about 18 years ago. Updated about 17 years ago.

Status:
Closed
Priority:
Should have
Category:
-
Target version:
-
Start date:
2006-02-23
Due date:
% Done:

0%

Estimated time:
TYPO3 Version:
4.0
PHP Version:
4
Tags:
Complexity:
Is Regression:
Sprint Focus:

Description

I get strange chars when fetching UTF-8 strings from locallang.xml.

- locallang.xml is in UTF-8
- locallang.xml has

Everything else is displayed fine (using forceCharset = utf-8).

(issue imported from #M2673)


Files

locallang.xml (30.4 KB) locallang.xml Administrator Admin, 2006-03-12 14:21
locallang3.xml (10.1 KB) locallang3.xml Administrator Admin, 2006-09-14 16:07

Related issues 1 (0 open1 closed)

Related to TYPO3 Core - Bug #16301: scrambled umlauts from locallang.xmlClosedChristian Kuhn2006-06-28

Actions
Actions #1

Updated by John Angel about 18 years ago

The same problem persists in v4 beta 3.

This is very important issue, UTF-8 chars are garbled!

Actions #2

Updated by Michael Stucki about 18 years ago

To me your problem looks like being caused by garbled XML data. So it is not TYPO3 which displays data wrong but it is the data which has been entered wrong.

Please tell me the steps to reproduce your problem.

Actions #3

Updated by John Angel about 18 years ago

Steps to reproduce the problem:

1. Put attached locallang.xml in fileadmin folder.

2. Use the following TS code:

lib.test = TEXT
lib.test.data = LLL:fileadmin/locallang.xml:test

page = PAGE
page.typeNum = 0
page.10 < lib.test

3. It displays garbled two-byte chars, instead of displaying normal UTF-8 chars.

Actions #4

Updated by John Angel about 18 years ago

Michael, please change Status to appropriate value and Severity to major.

Cannot finish the site without fixiing this bug.

Actions #5

Updated by Michael Stucki about 18 years ago

I currently have no time left so I cannot care for this until someone provides a fix for the problem. Can you do that, please?

Actions #6

Updated by Kasper Skårhøj about 18 years ago

This will be fixed with my next CVS update but let me remind you that in locallang-xml files you are NOT supposed to put anything but english - and therefore ASCII - into the default langauge! You have done that. Therefore you get garbled chars. Not if you had put your chars into the "ru" section or so.
(Issue was the default language was expected to be ASCII and not contain UTF-8 chars, therefore they were converted falsely).

Actions #7

Updated by John Angel almost 18 years ago

I still get question marks instead of normal chars (official v4).

Is it possible to get normal letters when default language is not English?

Actions #8

Updated by John Angel almost 18 years ago

Here is the code that makes question marks instead of utf-8 chars. Removing it solves the problem:

class.t3lib_div.php:3537

// Converting charset of default language from utf-8 to iso-8859-1 (since that is what the system would expect for default langauge in the core due to historical reasons)
// This conversion is unneccessary for 99,99% of all default labels since they are in english, therefore ASCII.
// However, an extension like TemplaVoila uses an extended character in its name, even in Default language. To accommodate that (special chars for default) this conversion must be made.
// Since the output from this function is probably always cached it is considered insignificant to do this conversion.
// - kasper

if (is_array($LOCAL_LANG['default'])) {
foreach($LOCAL_LANG['default'] as $labelKey => $labelValue) {
//$LOCAL_LANG['default'][$labelKey] = $csConvObj->utf8_decode($labelValue,'iso-8859-1');
}
}

Actions #9

Updated by Joen Weidemann almost 18 years ago

I am experiencing the same problem. The locallang of an extension (sr_feuser_register) gets outputted in the FE with ? instead of special characters like æøåéè. File is utf-8 and XML prologue is UTF-8. TYPO3 4.0.

Actions #10

Updated by Martin Kutschker over 17 years ago

An issue might be that the xml file may contain a byte order mark (BOM). This happens usually when you use a text editor on windows. To add BOM detection add this code t3lib_div:xml2array():

if (substr($string,0,3)=="\xEF\xBB\xBF") {
xml_parser_set_option($parser, XML_OPTION_TARGET_ENCODING, 'utf-8');
} elseif ((double)phpversion()>=5) {
..
}

Actions #11

Updated by Zak over 17 years ago

BOM is not the issue. Have the very same problem. Just edited locallang.xml in tx_indexedsearch extension, made a translation to Czech and the special characters come to questionmarks. Doesnt matter if I changed the default section or created new one.

Actions #12

Updated by Martin Kutschker over 17 years ago

Zak, if the output gets garbled AFTER you have edited something then obviously something has been done to the file. Please recheck if the file is still UTF-8 and contains no BOM.

Plus: You man NOT use accents in the default language!

Actions #13

Updated by Zak over 17 years ago

The only thing I did was, that I added some special Czech characters to the locallang.xml file of the tx_indexedsearch extension (obviously created a translation). The file contained only english (ASCII) characters before, so no wonder it was displayed properly. The file was and still is UTF8 and contains no BOM (really sure about that). The special characters got messed even in the non-default languageKey.
I am not very experienced TYPO3 user, so maybe there is other (better?) way how to display indexed search form in different languages ... ?

I have TYPO3 4.0.1, if it helps.

Actions #14

Updated by Martin Kutschker over 17 years ago

Zak, maybe you could attach your file here.

Actions #15

Updated by Zak over 17 years ago

I had to rename the file to locallang3.xml since locallang.xml is already upoaded here. The special czech characters are only in the "en" languageKey section, not in the default one. The button label and searchFor label (in "en" version) should look some like ěščěčšřžžý and it shows mostly questionmarks ... Thank fot help.

Actions #16

Updated by Zak over 17 years ago

Maybe useful to say that I use forceCharset UTF-8 and both the BE and FE are displayed correctly and in UTF-8.

Actions #17

Updated by Martin Kutschker over 17 years ago

Zak, there is no language key "en". "default" is English. You have to use "cz" for Czech contents.

For internal reason it does matter (because TYPO3 still knows that locallang.php files are in windows-1250 - it will actually USE this charset for some internal stuff).

AFAIK the problem is that the unknown language key triggers a fallback to iso-8559-1, which must lead to troubles. Maybe it makes sense to make utf-8 the fallback for data from xml files.

Actions #18

Updated by Michael Stucki over 17 years ago

So can we close this issue? As far as I can see, the only problem is that the reporter used the "default" language key when he should use the "cz" key.

Actions #19

Updated by Martin Kutschker over 17 years ago

No, he used "en" where he should have used "cz". But I think we should wait if he is really succesful after switching to "cz".

Actions #20

Updated by Zak over 17 years ago

Wow,so I changed the languageKey to "cz" and it works. My appologize. But it brings one question, maybe not right to ask on bugtracker, but I will give it a try :) Isn't it strange that I can use any UTF-8 characters when creating the page content in default language (even e.g. czech characters) but I cant translate the extensions to czech as default language? Is it supposed to be the default language in TYPO3 always english? What if someone wants to create e.g. czech page only?
Thank you very much for help and respect for your work.

Actions #21

Updated by Martin Kutschker over 17 years ago

"default" as key in localization file is "English". That has nothing to do what your default language in your site is.

Of course you could has you have done add non-English content into the "default" parts, but this is not what the system expects. Maybe "default" will accept UTF8 in the future, maybe not. So don't count on it and put your strings always into the correct language key section (see t3lib_cs for the allowed keys).

Actions #22

Updated by John Angel over 17 years ago

Why does Typo3 expect English as default language? Here is the solution that resolves the problem:


typo3/sysext/cms/tslib/class.tslib_fe.php:3964

CHANGE FROM:

$this->labelsCharset = $this->csConvObj->parse_charset($this->csConvObj->charSetArray[$this->lang] ? $this->csConvObj->charSetArray[$this->lang] : 'iso-8859-1');

TO:

$this->labelsCharset = $this->csConvObj->parse_charset($this->csConvObj->charSetArray[$this->lang] ? $this->csConvObj->charSetArray[$this->lang] : 'utf-8');


t3lib/class.t3lib_div.php:3581

CHANGE FROM:

$origCharset = $csConvObj->parse_charset($csConvObj->charSetArray[$langKey] ? $csConvObj->charSetArray[$langKey] : 'iso-8859-1');

TO:

$origCharset = $csConvObj->parse_charset($csConvObj->charSetArray[$langKey] ? $csConvObj->charSetArray[$langKey] : 'utf-8');


t3lib/class.t3lib_div.php:3610

CHANGE FROM:

if (is_array($LOCAL_LANG['default'])) {
foreach($LOCAL_LANG['default'] as $labelKey => $labelValue) {
$LOCAL_LANG['default'][$labelKey] = $csConvObj->utf8_decode($labelValue,'iso-8859-1');
}
}

TO:

/*
if (is_array($LOCAL_LANG['default'])) {
foreach($LOCAL_LANG['default'] as $labelKey => $labelValue) {
$LOCAL_LANG['default'][$labelKey] = $csConvObj->utf8_decode($labelValue,'iso-8859-1');
}
}
*/


Actions #23

Updated by John Angel over 17 years ago

I suggest getting rid of 'iso-8859-1' everywhere in Typo3 code, in favor of 'utf-8'.

Actions #24

Updated by Martin Kutschker over 17 years ago

John, English is as far as the configuration the default language. Additionally it has to be configured using the key "en" and not "default" which leads to all this confusion.

There is no intention to change this.

Actions #25

Updated by John Angel over 17 years ago

C’mon, not everybody is building websites with English as default language.

Please consult other users before closing this ticket right away.

As far as I can see, we are talking about changing 'iso-8859-1' to 'utf-8', nothing else.

Actions #26

Updated by Martin Kutschker over 17 years ago

John, setting the BE to default to UTF-8 is under way. Still this has nothing to do with the language in TYPO3 l10n structures with the key "default". This is English and in iso-8859-1.

I have explained how you as a (extension) devolper must enter strings for other languages.

Do not confuse the language key "default" with the language of your Website. The language of the Website is determined by mapping the CGI parameter L to a system language record (to be added by the admin in root) and a BE language. The language keys are fixed (see t3lib_cs).

Please stop reopening this bug until you understand the current localization concept of TYPO3.

The fine point is that although locallang.xml is in UTF8 all the character of each language must be present in the charset used in locallang.php for that language. So eg for German I cannot enter Cyrillic strings and for Default=English I cannot have Chinese glyphs. This may be changed in the future, but it is not a bug only an unexpected behaviour.

Actions #27

Updated by John Angel over 17 years ago

Martin, when you create the new page, new record is inserted in 'pages' table, assuming it has 'default' language: sys_language_uid=0. Translations of pages are inserted in 'pages_language_overlay' tables, having different sys_language_uid values.

Meaning: sys_language_uid=0 is default language always. Default language cannot be sys_language_uid=5 as far as I know.

If we can tell Typo3 what sys_language_uid should be default language for 'pages' table (instead of 0), then we resolved this matter.

If we cannot, then this bug should be opened.

Actions #28

Updated by John Angel over 17 years ago

I am just looking at BE 'Website Language' section, and cannot find option to find/change language with ID=0.

Do not confuse CGI parameter L with the complete story, it has nothing to do with it.

Actions #29

Updated by Michael Stucki over 17 years ago

The language labels of TYPO3 are not to be mixed up with any sys_language_uid related stuff. These can be configured individually, but language labels and their respective identifiers can't.

Since the default language of TYPO3 is English, the identifier for this language is "default". It is reserved for English labels, and therefore the character set for the labels has been set to "iso-8859-1". As Martin mentioned, this might be changed due to an overall migration to UTF-8 soon, but still it is technically wrong to use the "default" key for other languages than English.

I will hopefully close this issue for the last time now. If you are still not convinced or understand our explanations clearly, please raise a discussion in the typo3-dev list. Other people might be more successful in convincing you... :-)

Actions

Also available in: Atom PDF