Project

General

Profile

Actions

Bug #14264

closed

getUpdateJS: broken when using UTF-8 and IE

Added by Simon Ihmig over 20 years ago. Updated over 18 years ago.

Status:
Closed
Priority:
Must have
Category:
Frontend
Target version:
-
Start date:
2004-08-05
Due date:
% Done:

0%

Estimated time:
TYPO3 Version:
3.8.0
PHP Version:
4
Tags:
Complexity:
Is Regression:
Sprint Focus:

Description

The function getUpdateJS in class.tslib_content.php, used by feAdmin_Lib, does not work correctly when using an UTF-8 encoded site and Microsoft Internet Explorer.

The part in question is where it outputs: unescape('".rawurlencode($value)."')

The JS escape/unescape functions under IE behave differently than PHPs urlencode functions:
escape('ü') -> %FC ('ü': UTF-8 code for German u umlaut ü)
rawurlencode('ü') -> %C3%BC

when the JS function unescape is executed with the value of rawurlencode('ü'), the browser shows 'ü' instead of 'ü'.

With Netscape 7.1 it works ok.
See the attached utftest.php for a demonstration.

When using feAdmin_lib or 'sr_feuser_register', this problem leads to strange characters in the database when entering non-ASCII characters, e.g. German umlauts.

I temporarily solved this problem by not using unescape/rawurlencode in getUpdateJS, but I don't know if this does not lead to other problems somewhere!?

(issue imported from #M277)


Files

0000277-utftest.php (602 Bytes) 0000277-utftest.php Administrator Admin, 2004-08-05 13:14
0000277-getUpdateJSModified.txt (2.05 KB) 0000277-getUpdateJSModified.txt Administrator Admin, 2004-08-26 20:37
tslib_cObj.patch (2.01 KB) tslib_cObj.patch Administrator Admin, 2005-04-12 13:22
T3X_htutf8js-0_0_1-z-200508081249.t3x (9.05 KB) T3X_htutf8js-0_0_1-z-200508081249.t3x Administrator Admin, 2005-08-08 12:45
T3X_htutf8js-0_0_1-200508111048.t3x (37.9 KB) T3X_htutf8js-0_0_1-200508111048.t3x Administrator Admin, 2005-08-11 10:46
utf8_JS_2005-10-26.patch (5.83 KB) utf8_JS_2005-10-26.patch Administrator Admin, 2005-10-26 17:56
utf8_JS_2006-01-17.patch (5.85 KB) utf8_JS_2006-01-17.patch Administrator Admin, 2006-01-17 13:51
utf8_JS_2005-12-13.patch (7.56 KB) utf8_JS_2005-12-13.patch Administrator Admin, 2006-01-17 21:39

Related issues 4 (0 open4 closed)

Related to TYPO3 Core - Bug #14699: JSMENU broken when using UTF-8ClosedBernhard Kraft2005-04-22

Actions
Related to TYPO3 Core - Bug #14984: Editpanel confirm dialogs (del/hide) don't display umlauts/etcClosedBernhard Kraft2005-09-21

Actions
Related to TYPO3 Core - Bug #14853: Japanese language extension does not work with UTF-8ClosedBernhard Kraft2005-07-10

Actions
Related to TYPO3 Core - Bug #17226: JS escape/unescape functions are differently than PHPs urlencode functions when using utf8 in mysql DBClosed2007-04-20

Actions
Actions #1

Updated by Ingmar Schlecht over 20 years ago

Assigning the bug to our charset hero Masi.

Actions #2

Updated by Martin Kutschker over 20 years ago

I am not sure if I understand everything from the bug, but this is what I just experienced in a different setup involving JS escape() and IE 5.5 (with security patches on Win98):

Some IEs (I cannot believe the newer ones do that) convert UTF-8 strings into iso-8859-1/windows-1252 before performing the actual escaping.

This will also break popup wizards in the BE! What's more: any code that relies on JS TBE_EDITOR_rawurlencode() in the BE will get wrong results.

So what needs to be done is, check out which versions of IE are affected. Decide if we should introduce a work-around for this versions. If we need one, then we devise one.

Actions #3

Updated by Martin Kutschker over 20 years ago

More info:

The MS docs say that Unicode chars shall be returned as %uxxxx. Unfortunately I cannot test right now on a newer IE. But obviously any decoding routine of Typo3 must be aware of this format.

PS: %FC is iso-8859-1 for German umlaut ü, so the implict conversion is obvious.

Actions #4

Updated by Stanislas Rolland about 20 years ago

I modified getUpdateJS for extension sr_feuser_register:
if (window.decodeURIComponent) { unesc = decodeURIComponent('".rawurlencode($convValue)."') } else { unesc = unescape('".rawurlencode($Nvalue)."') };

where $convValue is the $Nvalue converted to utf-8.

The form now works correctly with utf-8 in Mozilla 1.5 and IE6 and in any other encoding if utf-8 conversion is available (TYPO3 3.6+ or mbstring or utf8_encode).

edited on: 11.03.05 08:01

Actions #5

Updated by Martin Kutschker about 20 years ago

Ok., as far as we know now there are to solutions to solve it, both require browser sniffing. In one case we determine if escape() is "old style" (charset transparent) or "new style" (Unicode), in the other case we test for the availability of encodeURIComponent().

As this behaviour affects more code than just getUpdateJS() I vote to wait for a general solution.

Actions #6

Updated by Bernhard Kraft over 19 years ago

I replaced the "rawurlencode" call by a call to "addslashes" this should be sufficient as the string which gets encoded is quoted. Every character is allowed in the string except the quotation mark.

This problem also occured to me with Mozilla Firefox 0.9?

Actions #7

Updated by Thorsten Kahler over 19 years ago

I just found this comment in class language:
<cite>
Converts the input string to a JavaScript function returning the same string, but charset-safe.
Used for confirm and alert boxes where we must make sure that any string content does not break the script AND want to make sure the charset is preserved.
Originally I used the JS function unescape() in combination with PHP function rawurlencode() in order to pass strings in a safe way. This could still be done for iso-8859-1 charsets but now I have applied the same method here for all charsets.
</cite>
It describes function JScharCode($str) which makes use of t3lib_cs::utf8_to_numberarray(). So it seems Masi already has found a solution but doesn't remember ;-)

I adapted JScharCode() for use in class tslib_cObj and it works for me. The attached file is a patch file for version 3.6.1. Perhaps Masi can port this patch for 3.8.0?

Actions #8

Updated by Daniel Gercke over 19 years ago

Hi,

in JScharCode() at this line: $GLOBALS['TSFE']->siteCharset;

What is siteCharset? Didn´t foud any other point in source. Now metaCharset?

I´m also think that we must modify the Line:

$validateForm=' onsubmit="return validateForm(\''.$formname.'\',\''.implode(',',$fieldlist).'\',\''.rawurlencode($conf['goodMess']).'\',\''.rawurlencode($conf['badMess']).'\',\''.rawurlencode($conf['emailMess']).'\')"';

to be now:

$validateForm=' onsubmit="return validateForm(\''.$formname.'\',\''.implode(',',$fieldlist).'\','.$this->JScharCode($conf['goodMess']).','.$this->JScharCode($conf['badMess']).','.$this->JScharCode($conf['emailMess']).')"';

to support Validation Messages.

Actions #9

Updated by Thorsten Kahler over 19 years ago

The patch I provided is for version 3.6.1 (as mentioned).

In class tslib_fe (which is instantiated in TYPO3 as $GLOBALS['TSFE'] normally) the variable siteCharset was replaced by renderCharset and metaCharset (see TSRef [1]) in version 3.7.0. The right substitution for siteCharset would be renderCharset for newer versions2. So if someone wants to use the patch on 3.7 or 3.8 "siteCharset" has to be changed to "renderCharset" in line 37 of the patch file.

IMHO Masi should provide an equivalent function in t3lib_cs, that could be used in BE and FE. Of course it's possible to apply my patch to other parts of the core (as suggested by haustier). But that's just a workaround for urgent problems and not a solution to the underlying issue.

There are many places in the core where it's necessary to have a function at hand that provides valid UTF-8 text. tslib_cObj is IMHO not the right place for such a function, as it's not directly concerning content as such. There's already at least one other bug report (#14699 [3]), that applies to this issue.

[1] http://typo3.org/documentation/document-library/doc_core_tsref/quot_CONFIG_quot/
[2] the split took place in revision 1.33 of class.tslib_fe.php
[3] http://bugs.typo3.org/view.php?id=1030

Actions #10

Updated by Thorsten Kahler over 19 years ago

Hi Masi,

could you please take a look at this issue again? IMHO it needs a general solution, as JS functionality is used in TYPO3 everywhere. I think you're better in on that charset and multilanguage stuff than anyone else.

Thanx
Thorsten

Actions #11

Updated by Daniel Gercke over 19 years ago

Extension is for Typo3.8.0. It fixes also Problems with these kind of strings in Form Validation Output.

Feel free to try.

Actions #12

Updated by Martin Kutschker over 19 years ago

T3X_htutf8js-0_0_0.t3x contains absoultely no code whatsoever.

Anyway, I think the best solution would be to drop the whole encoding stuff completely. The second best is to use it only for iso-8859-1.

The rational is that modern browsers don't need the encoding (which is untrue for really old browsers of the bad old days) and it's now broken (since IE 5.5 and Mozilla 1.3) for all charsets except iso-8859-1. Old browsers didn't have support for UTF-8, so there's little loss.

Actions #13

Updated by Daniel Gercke over 19 years ago

Sorry.

I uploaded the false t3x. New one is now there and all files should be in there.

Please let me now when you find some problems.

Thanks

Actions #14

Updated by Gideon So over 19 years ago

Does it solve utf-8 character problem in JSMENU?? I install the extension and find that all chinese with utf-8 in JSMENU still broken.

Gideon

Actions #15

Updated by Daniel Gercke over 19 years ago

No it doesn´t solve this problem. But if you look into class.ux_tslib_content.php, there is a Function "JScharCode".
At typo3.8 there is in class.tslib_menu.php at line 2710:

$title=rawurlencode($data['title']);

this must replaced with a call like:

$title=JScharCode($data['title']);

and line 2722:

$codeLines.="\n".$var.$count."=".$menuName.".add(".$parent.",".$prev.",0,'".$title."','".$GLOBALS['TSFE']->baseUrlWrap($url)."','".$target."');";

like this:
$codeLines.="\n".$var.$count."=".$menuName.".add(".$parent.",".$prev.",0,".$title.",'".$GLOBALS['TSFE']->baseUrlWrap($url)."','".$target."');";

i think.

No guarantee. If it works, please post it here.

Actions #16

Updated by Gideon So over 19 years ago

If I replace $title=rawurlencode($data['title']); with $title=JScharCode($data['title']);. It shows me a blank page.

and

what's the different between

$codeLines.="\n".$var.$count."=".$menuName.".add(".$parent.",".$prev.",0,'".$title."','".$GLOBALS['TSFE']->baseUrlWrap($url)."','".$target."');";
$codeLines.="\n".$var.$count."=".$menuName.".add(".$parent.",".$prev.",0,".$title.",'".$GLOBALS['TSFE']->baseUrlWrap($url)."','".$target."');";

Gideon

Actions #17

Updated by Daniel Gercke over 19 years ago

I have added a new Version of the Extension which also Handle utf-8 chars in JSMENU.

Greetings to gideonso :-)

$codeLines question:

if you look at the part :

... 0,'".$title."','".$GLOBALS['TSFE'] ...

there is the difference. Only two ' must deleted.

Actions #18

Updated by Gideon So over 19 years ago

It works!! It works!! It works!!!

Horray!! Horray!! Horray!!

Gideon

Actions #19

Updated by Bernhard Kraft about 19 years ago

OK.

I already posted this note but nobody seemd to mention it. So again as I want to solve this problem now as it is one of the Gremlins which we shall fix during our bugfixing session.

Could anybody enligthen me what the sense of:

$codeLines.="\n".$menuName.".defTopTitle[".$count."] = unescape('".rawurlencode($levelConf['firstLabel'])."');";

Obviously someone hasn't read a HTML or XHTML DTD (Document Type Definition). Else it would be clear that CDATA sections don't need to get escaped ! And everything inside a <script> tag is CDATA by default (no matter wheter you note the CDATA opening and closing tags)

Of course the content of an "onClick" event isn't CDATA but PCDATA ... This means HTML entities get escaped (ü becomes ü). So you would have to escape the & character also when you use a php-var insided an onclick event.

So in my opinion all those rawurlencode and unescape things are unnecessary and
a simple
addcslashes($str '\'')
would do it. Just escape all ' if you use them as start and end delimiter for your JS strings. No other characters need to get escaped.

Maybe someone tought generating entities is required in the whole HTML file (but in fact it is not required for CDATA type)

At least this is true for modern browsers (IE, FF, Safari, Opera). Maybe some older (4.x Netscape :) complains about it.

I attach a .diff to this bug (utf8_JS.patch). Could somebody with access to an older browser test it ?

Actions #20

Updated by Martin Kutschker about 19 years ago

Bernhard, your too young :-)

Old implementations of Javascript had all kind of problems with non-ASCII characters. But you're right, all this stuff is not needed for todays browsers.

Actions #21

Updated by Daniel Gercke about 19 years ago

Hi Bernd,

your patch didn´t work for me. I´m using Firefox 1.0.7.
When i look into the sourcecode (html) there are no & entities, there are only utf-8 codes.

Actions #22

Updated by Bernhard Kraft about 19 years ago

Well ... I explained it. There is no need for entities !

Masi already brought it to the point : entities in javascript strings are required for pre 4.x browsers (Do you still know the Turning-Wheel Logo of Netscape 3.x ??? :)

You can only check if it works by inserting a FE-User Registration Form (or a Newsletter Subscription form) and fill it out not completly. You will get to the same page again with error messages. If then special characters in the fields are garbled it doesn't work. If "Ü" stays at it is it works fine.

You can also test it with a JSMENU and page-titles having special characters or japanese or something like that.

Actions #23

Updated by Daniel Gercke about 19 years ago

Hi Bernd,

here is a piece of source code:

<----snip----->
onsubmit="return validateForm('a7867b54b21bdc5f71467146bd2208bbf','Nachname,Nachname%2A,Vorname,Vorname%2A,Strasse,Strasse%2A,PLZ_und_Ort,PLZ%2FOrt%2A,eMail,eMail%2A','','Sie%20haben%20nicht%20alle%20ben%C3%B6tigten%20Felder%20ausgef%C3%BCllt.%20Bitte%20%C3%BCberpr%C3%BCfen%20Sie%20Ihre%20Eingaben.','')">
<---snap----->

So it looks before AND after your patch.
Both didn´t work for me.

Testet with a mailform and not filled required fields.

Did i made a mistake?

Actions #24

Updated by Bernhard Kraft about 19 years ago

The actual patch:
utf8_JS_2005-10-22.patch
fixes following problems:

Mailform goodMess, badMess, emailMess with utf-8 entities. (No bug #)
FE-User admin/Direct Mail subscription utf-8 entities in values (Bug #14264)
JSMENU with utf8 (Bug #14264)
Editpanel confirm dialogs with utf8 (Bug #14264)

please try it out and write me back if you had a negative or positive experience:

If this works fine and no problems come up it will get added to the core and be contained in T3 4.0.0

Actions #25

Updated by Daniel Gercke about 19 years ago

Great work Bernd,

it works fine in our projects.

Thanks

Actions #26

Updated by Sebastian Kurfuerst about 19 years ago

Hi,
it would be great if you could check that your patch as well fixes bug 1270, I think it might be the same issue but I am not sure. Greets, Sebastian

Actions #27

Updated by Stanislas Rolland about 19 years ago

With Bernhard's solution, with a field of type textarea containing a linebreak, I get a javascript error "unterminated string literal" in updateForm. If the field does not contain a linebreak, I do not get the error.

I suggest replacing line
$value = addcslashes($value, '\'');
in function t3lib_div::quoteJSvalue
by line
$value = addcslashes($value, '\''.chr(10).chr(13));

Actions #28

Updated by Karsten Dambekalns almost 19 years ago

I tried (on 3.8.1) Bernhards patch utf8_JS_2005-10-26.patch, which works fine together with the small change suggested by Stanislas in comment 3773.

Great!

Actions #29

Updated by Karsten Dambekalns almost 19 years ago

I just uploaded a new patch that incorporates the change suggested by Stanislas in comment 3773 with Bernhards patch.

Actions #30

Updated by Bernhard Kraft almost 19 years ago

Hello,

I uploaded a patch "utf8_JS_2005-12-13.patch" which seems to be earlier than Karstens but it incorporates The suggestion from Stanislas from 08-11-2005 (ID (0003773))

It probably also fixes the problem currently active on the dev list (I made this change for Michael Stucki and he reported it to work):
http://lists.netfielders.de/pipermail/typo3-dev/2006-January/014832.html

And was already sent to the core list and approved by Kasper (:
http://lists.netfielders.de/pipermail/typo3-team-core/2005-December/000806.html

I will put it into CVS now.

Actions

Also available in: Atom PDF