Bug #16178

UTF-8: Broken special characters on cached pages

Added by Jan Wulff over 12 years ago. Updated almost 7 years ago.

Status:
Closed
Priority:
Should have
Assignee:
-
Category:
-
Target version:
-
Start date:
2006-05-24
Due date:
% Done:

0%

TYPO3 Version:
4.0
PHP Version:
4.3
Tags:
Complexity:
Is Regression:
Sprint Focus:

Description

I believe this is closely related to ##0003303, but not exactly the same bug. I'm using Typo3 with UTF-8. In the frontend uncached pages are displayed fine, but as soon as pages are cached, some pages are displayed with '?' instead of the umlaut. Strangely this doesn't happen with all pages, just with two or three of around 20. When I clear the FE cache, all pages are displayed fine. As soon as the pages are cached, some of them are displayed wrong again. This error occurs on different pages after each time clearing the cache.

I'm using these options:
Typoscript:
config.renderCharset = utf-8
config.metaCharset = utf-8
config.additionalHeaders = Content-Type:text/html;charset=utf-8

Typo3ConfVars:
[forceCharset] = utf-8
[multiplyDBfieldSize] = 2
[SYS][setDBinit] = SET NAMES utf8 SET CHARACTER SET utf8
[FE][tidy] = 0

MySQL database:
MySQL version: 5.0.21
MySQL charset: UTF-8 Unicode (utf8)
MySQL connection collation: utf8_general_ci

PHP version: 4.4.2
(issue imported from #M3547)

0003547.diff View (7.83 KB) Administrator Admin, 2010-06-29 09:40


Related issues

Related to TYPO3 Core - Bug #16069: Pages loaded from cache show special characters (like umlaut) wrong Closed 2006-04-20
Related to TYPO3 Core - Feature #17503: BE should check Mysql charset settings Closed 2007-08-07
Related to TYPO3 Core - Feature #18501: Enable UTF-8 by default Closed 2008-03-26

History

#1 Updated by Martin Kutschker over 12 years ago

Should not matter, but your TS is overcomplete. config.renderCharset = utf-8 is all you need. metaCharset defaults to renderCharset and the HTTP header is also set automatically.

#2 Updated by Valery Romanchev over 12 years ago

I have the same problem on TYPO3 4.0. MySQL 4.1

I get it after implementing
$TYPO3_CONF_VARS['SYS']['setDBinit'] = 'SET NAMES utf8'.chr(10).'SET
CHARACTER SET utf8';

see http://bugs.typo3.org/view.php?id=1262

I have default charset latin in my.cnf
Database is in utf8

Important:
1) when pconnect is disable in localconf.php this problem appear all the time.
2) I get this problem one time when do update of guestbook record in TYPO3 List module. So it look like MySQL+PHP update problem.

Now I am trying to solve this by changes in my.cnf
I comment out log-bin and disable query cache -
by now site works OK.

[mysqld]
datadir=/var/lib/mysql
socket=/var/lib/mysql/mysql.sock
  1. Default to using old password format for compatibility with mysql 3.x
  2. clients (those using the mysqlclient10 compatibility package).
    old_passwords=1
    default-character-set = latin1
  1. log-bin
    query_cache_limit = 2M # default was 1M
  2. query_cache_size = 64M # default was 0
    query_cache_size = 0 # default was 0
    query_cache_type = 0 # was 1
    table_cache = 256 # default was 64
    key_buffer_size = 64M # default was 8M

max_allowed_packet = 16M
max_connections = 300

[mysql.server]
user=mysql
basedir=/var/lib

[mysqld_safe]
err-log=/var/log/mysqld.log
pid-file=/var/run/mysqld/mysqld.pid

#3 Updated by Valery Romanchev over 12 years ago

Unfortunately disable of log-bin and query cache does not 100% solve the problem. Yesterday site works fine (after clear cache, after a lot of insert/update operations etc)
But just now I get home page with "????????" only insted of all russian letters.
The reload of the page solve the problem... But this not stable behavior is even worse then ablosutly not working!
On the same server I have many 4.0 sites without "SET NAMES utf8.... " - all site works fine every time (of cource without proper sorting and seach, because of no utf8_general_ci connection collation)

#4 Updated by Valery Romanchev over 12 years ago

Now I use:
$TYPO3_CONF_VARS['SYS']['setDBinit'] = 'SET NAMES utf8;';
As I understand from http://dev.mysql.com/doc/refman/5.0/en/charset-connection.html
we need only this.
Site works now.

Finally I use:
$TYPO3_CONF_VARS['SYS']['setDBinit'] = 'SET character_set_connection = utf8;';

it look like minimum settings - and it works for me.
Please test this, if you can.

#5 Updated by Jan Wulff over 12 years ago

I'm using
$TYPO3_CONF_VARS['SYS']['setDBinit'] = 'SET character_set_connection = utf8;';
now and so far everything is working again. I supposed that it had something to do with the database, because the error seemingly occured at random.
Thanks for the info, great work.

P.S.: I'm a bit overcautious with Typoscript when searching for a bug, but I appreciate the info, Martin.

#6 Updated by Valery Romanchev over 12 years ago

I get random problems with update with
$TYPO3_CONF_VARS['SYS']['setDBinit'] = 'SET character_set_connection = utf8;';
(sometimes I get ???? after update content)

So I turn back to
$TYPO3_CONF_VARS['SYS']['setDBinit'] = 'SET NAMES utf8;';
(and I do not see this errors with update)

I also check bugs at http://bugs.mysql.com/
(for "SET NAMES utf8" ) I find something, but not exactly this
So the problem is not easy.

#7 Updated by Jan Wulff over 12 years ago

Sorry, then I have misunderstood you. I haven't had any new problems so far, but I will keep an eye on the website.

Do you mean this mysql bug report?
http://bugs.mysql.com/bug.php?id=19637

At first we should find out, if the problem occurs when the cache is written or when it is read, but I can only take a look at it, when and if the problem occurs again.

#8 Updated by Alain Sch¤fer over 12 years ago

check if you have HTML tidy enabled in the Installer Tool.
I had a similar problem and $TYPO3_CONF_VARS['FE']['tidy'] = '0'
solved it. Or you might check the tidy documentation how to
run it correctly with uf8.

#9 Updated by Dmitry Dulepov over 12 years ago

[multiplyDBfieldSize] = 2 is NOT needed here. It is only of your database is latin1 but content is utf-8.

#10 Updated by Ries van Twisk over 12 years ago

I had the same problem but it seems like the problem is a bit faugue in ANY CASE,
Here are my two cents.

Inded I had teh same problem... DB is set to utf-8 collation and encoding, setDBInit and all the stuff... Page generation was ok, but cached pages shows teh little ??

To solve it in my case I change the field HTML in the table cache_paches from blob to longtext with the correct collation (utf_8_general_ci, butter is to use utf8_unicode_ci, but this DB was in utf8_general:ci already).

For me this makes sence, ANY text field should really be in the correct collation and not in a binary (blob) format. What might happen in hour case is that mysql is going to do a latin1->utf8 encoding because the blob is handled as binary. Thus showing the?? because it's encoding again.

When storing teh data in the right encoding from the beginning we are 100% sure that mysql is not going to do a extra encoding.

thanks,
Ries
www.rvantwisk.nl

#11 Updated by Martin Kutschker over 12 years ago

Ries, please have a look at the current SVN code (trunk). AFAIR all BLOBs have been changed to TEXT fields.

#12 Updated by Ries van Twisk over 12 years ago

Hi martin,

I hardly ever look at trunk....
But I am glad to hear that all blob fields are finally changed to appropriate fields.

thanks,
keep up the good work,
Ries

Same issue is posted here : http://bugs.typo3.org/view.php?id=0003303,

#13 Updated by Patrick Broens almost 12 years ago

There are still BLOBs in system extensions:

./typo3/sysext/cms/ext_tables-index.sql
./typo3/sysext/cms/ext_tables.sql
./typo3/sysext/cms/ext_tables_static+adt.sql
./typo3/sysext/dbal/ext_tables.sql
./typo3/sysext/sys_action/ext_tables.sql
./typo3/sysext/indexed_search/ext_tables.sql
./typo3/sysext/impexp/ext_tables.sql
./typo3/sysext/tsconfig_help/ext_tables.sql
./typo3/sysext/tsconfig_help/ext_tables_static+adt.sql

I've already posted it to the core list to solve this. It's a pitty this didn't get into version 4.1

I ran into the same problems described in this bug and changing the BLOB to TEXT solved this issue

#14 Updated by Kirill Klimov over 11 years ago

I could confirm the same problem (? appearing instead of text for some of cached pages randomly) on TYPO3 4.1, MySQL 4.1 with
$TYPO3_CONF_VARS["BE"]["forceCharset"] = 'utf-8';
$TYPO3_CONF_VARS['SYS']['setDBinit'] = 'SET NAMES utf8'.chr(10).'SET CHARACTER SET utf8';

setting
$TYPO3_CONF_VARS['SYS']['setDBinit'] = 'SET NAMES utf8;';
has fixed the problem (at least I can't reproduce it anymore for quite some time).

Thanks to Valery Romanchev for the hint.

#15 Updated by Ries van Twisk over 11 years ago

@KIRILL:

Try to set it like this: $TYPO3_CONF_VARS['SYS']['setDBinit'] = 'SET NAMES utf8;'.chr(10).'SET CHARACTER SET utf8';

As far as I know these are SQl statements and the need to be separated by a ;
I don't know much about internals of setDBinit, may be the get split by chr(10) and executed seperatly. Or that they are send to mysql as one SQL statement.

If they are send as one statement. Would mysql not prevent this because I remember php's mysql client is buggy and can only execute one statement at a time?

Properly teh guru's can answer....

#16 Updated by Marc Wöhlken about 11 years ago

We have different TYPO3 installations running on one dev-server.
The strange behavior of random pages being displayed with garbled characters appeared under very special circumstances:
- You have at least on installation using UTF-8
- You have at least one installation using another char encoding (e.g. iso-8559-1)
- You use the same(!) db-user and password to connect to MySQL

This setup resulted in random pages being garbled within the iso-8559-1 installation. UTF-8 was always fine. I suppose this was caused by a shared MySQL connection.

Solution: Use different db-users to separate the db connections.

#17 Updated by Martin Kutschker about 11 years ago

Marc, another solution is to disable persistent connections. But of course this might be a performacen penalty.

#18 Updated by David Bruchmann almost 11 years ago

The Expression
$TYPO3_CONF_VARS['SYS']['setDBinit'] = 'SET NAMES utf8;'.chr(10).'SET CHARACTER SET utf8';
causes a MySQL-Error becaus it seems to be wrong!!!

I'm not sure about the delimiter, but chr(10) is not required I think.
I don't know if the rest of the SQL-statement should be written with "," or with ";" as delimiter.
The second statement I haven't found at all in the mysql-manual, instead I found the two statements with the same sense:
1) CHARACTER SET utf8
2) CHARSET utf8
As I understood the manual the right line has to be:
$TYPO3_CONF_VARS['SYS']['setDBinit'] = 'SET NAMES utf8; CHARACTER SET utf8';
or
$TYPO3_CONF_VARS['SYS']['setDBinit'] = 'SET NAMES utf8; CHARSET utf8';

I don't know who head postet the wrong statement at first, but it can be found in many forums and nobody has checked it's syntax.

Nevertheless I have problems to save data in the backend and get the mentioned Error-message. Obviously it's hard to find anyone who can give a clear hint, but I'll check Martin's one.

#19 Updated by David Bruchmann almost 11 years ago

After some Experiments I found out that my Syntax

'SET NAMES utf8; CHARACTER SET utf8';

from above has to be right because it produces another result than

'SET NAMES utf8; SET CHARACTER SET utf8';

Latter produces the same result as omitting the last statement. The result from the first one isn't correct but it's accepted by MySQL, so the result from the second one only is right because the last statement is wrong and ignored by MySQL.

So, Valery's hint only to use 'SET NAMES utf8;' seems to be the best one.

#20 Updated by Chris topher over 8 years ago

Hi guys,

is this still reproducable with current versions?
I just changed a site to utf-8 and as far as I see caching works correctly. I also can't believe that noone else reported this, if it was a general problem.

#21 Updated by Christian Wolff over 8 years ago

its still repoducable i have to currently 2 sites with this problem.
on different servers both seem to randomly change some cached pages

i use typo3 4.3.3

#22 Updated by Ralle Büchnitz over 8 years ago

I can confirm this too.

I use Typo3 4.3.3 and have updated db to utf-8 and set all params to utf-8 as well.
Then randomly the pages are displayed correct and at another time there are problems with german-umlauts. It seems to be a caching problem, then after clearing all caches and "force reload all" from browser everything is fine.

For me helped the solution with 'SET NAMES utf8;

SQL-Server-Details: MySQL 5.1.44 on external Server

READ: http://wiki.typo3.org/index.php/UTF-8_support
// If you read the whole article, you will also recognize that some options mentioned above are not necessary.

#23 Updated by Chris topher over 8 years ago

So we can summarize as follows:

First: Each(!) MySQL command must end with a ";". If you have two commands, both must end with a semicolon (@ David).

This problem might be influenced by [forceCharset] and [SYS][setDBinit].
forceCharset must be set to utf-8 and I think it does not influence the caching of pages.
So the problem is in [SYS][setDBinit].

People in this issue used different settings there:
- Only using SET NAMES utf8; always works.

- Setting SET NAMES utf8; and SET CHARACTER SET utf8; causes the problem described here. So the cause seems to be SET CHARACTER SET utf8;. Dmitry mentioned the same on the UTF-8 support page of the Wiki.
According to the MySQL docs
http://dev.mysql.com/doc/refman/5.1/en/charset-connection.html
the 2 commands set the same variables. But only SET NAMES utf8; really sets all to utf-8, while SET CHARACTER SET utf8; sets character_set_connection and collation_connection to the value of the database (which could be anything). If there is no bug in MySQL, this must cause our problem here.

- Only setting SET character_set_connection = utf8; can cause problems. This command is a part of SET NAMES utf8; - obviously it alone is not enough.

- These symptoms can also be caused by Tidy; solution: $TYPO3_CONF_VARS['FE']['tidy'] = '0';

#24 Updated by Chris topher over 8 years ago

Guys, please confirm, that it works correctly, if you only set SET NAMES utf8; and nothing else in [SYS][setDBinit]!

#25 Updated by Christian Wolff over 8 years ago

i solved the Problem by converting, the field HTML in the table cache_pages to text instead of mediumblob it semese that sometimes there goes somthing wrong with the UTF-8 in the blob.
after changeing the field type everything works fine.

@Chrstopher im pretty shure that i had tried SET NAMES utf8; and it din't change anything.

#26 Updated by Ralle Büchnitz over 8 years ago

I tested it on my local server and solved nearly all problems with

SET NAMES utf8;

-----
After clearing all caches, deleting all temp_cached* files and logging out from Backend. The frontend looks fine! After logging out, typo3-frontend should open a new connection with the new dbinit-param. After this you can relogin and the backend looks good, too. - If you still have problems with specialchars - simply overwrite them / or at least some for testing purposes. Always clear cache after updating the wrong displayed signs.
-----
After relogin to backend and reloading the frontend some little problems remain viewing tt_news content. But this could be a problem of tt_news.

#27 Updated by Chris topher over 8 years ago

@ Christian: That's strange:
I can reproduce this problem by modifying setDBinit. Changing it back to SET NAMES utf8; always solves it again. I tried that on different systems and it always is that way. With my configuration it is clearly SET CHARACTER SET utf8; causing the problem.

Maybe you can recheck that again?

#28 Updated by Steffen Gebert almost 7 years ago

  • Category deleted (Communication)
  • Status changed from New to Closed
  • Target version deleted (0)

Also available in: Atom PDF