Bug #21569

Wrong character encoding in cache tables breaks frontend rendering

Added by Steffen Kamper over 3 years ago. Updated almost 3 years ago.

Status:Closed Start date:2009-11-16
Priority:Must have Due date:
Assignee:Oliver Hader % Done:

0%

Category:-
Target version:-
TYPO3 Version:4.3 Complexity:
PHP Version:5.3
Votes: 0

Description

There are several issues where cache data makes problems, eg if renderCharset != metaCharset. Then it happens if the cache data has special char like an Umlaut, the returned data is corrupted.

Reason is that cache-table use field TEXT for serialized arrays, therfore MySQL respect the charset.

Solution: use BLOB instead

(issue imported from #M12613)

12613.diff (2.3 kB) Administrator Admin, 2009-11-16 17:01

0012613_v2.patch (1.4 kB) Administrator Admin, 2009-11-25 12:16


Related issues

related to Core - Feature #21525: No typoscript template found - Addon Closed 2009-11-10
related to Core - Bug #17091: "No template found" after update from 4.0.4 to 4.1 Closed 2007-03-07
related to Core - Bug #17437: When accessing pages form cache "No Temlpate found!" appears Resolved 2007-07-21
related to Core - Bug #20092: Typo3 FE crashs with single-char umlauts in typoscript Closed 2009-02-25
related to Core - Bug #21421: slow t3lib_TSparser::parseSub Closed 2009-11-01

History

Updated by Martin Kutschker over 3 years ago

In cache_pages the field HTML stores the complete page. Yet this must be also a BLOB since Mysql (and other DBs) take it ill if the sent data is not in the charset of the column.

To be precise: if you have the DB in utf-8 the content will be truncated at the first byte that is invalid in utf8.

Updated by Bernhard Kraft over 3 years ago

What exactly do you mean by having the DB in utf-8. Today I tried setting the collation of a database to utf-8 and also the collation of tables, but for some reason I could not reproduce the this case. I know it happens, but I would like to know under which circumstances.

Which settings do I have to make to the database to let this case happen?

Updated by Bernhard Kraft over 3 years ago

Ok. Just tested the description of #17053 which seems to be the same problem.

Back in these days (2007) for some reason the "content" field of cache_hash got changed from "mediumblob" to "mediumtext" which seemed to introduce this error.

I could reproduce the error using Michaels bug note 0013133 in bug #17053.

Changing the field "content" in table "cache_hash" back from mediumtext to mediumblob solved the problem for me.

Updated by Stefan Geith over 3 years ago

I applied your patch and it works!
But one note:

CREATE TABLE cache_pagesection (
page_id int(11) unsigned DEFAULT '0' NOT NULL,
mpvar_hash int(11) unsigned DEFAULT '0' NOT NULL,
- content text,
+ content mediumblob, <--------------- shouldn't this be blob, not mediumblob ?
tstamp int(11) unsigned DEFAULT '0' NOT NULL,
PRIMARY KEY (page_id,mpvar_hash)
) ENGINE=InnoDB;

Updated by Oliver Hader over 3 years ago

That was the situation when the caching framework was introduced:
http://forge.typo3.org/repositories/diff/typo3v4-core?rev=4336

The tables of the caching framework can stay as they are (with "TEXT") since the caching framework performs an additional serialize() before writing to the database.
The only tables that have to be changed are cache_hash, cache_pages and cache_pagesection.

Updated by Steffen Kamper over 3 years ago

committed in trunk, rev 6525

Updated by Martin Kutschker over 3 years ago

Serializing does not help if you write iso-8859-1/latin1 (or any other charset) into a utf.8 field. The data will be TRUNCATED (!) at the first character that is not valid in utf-8.

This is similar as with iconv (maybe Mysql uses it). iconv stops any operation when it encounters invalid input,

Updated by Oliver Hader over 3 years ago

Masi, why did you reopen this issue again?

Updated by Oliver Hader over 3 years ago

Could not reproduce with the caching framework since the data gets serialized twice there...
I tested it with regular caching (I could reproduce the bad behaviour) and with caching framework (I could not reproduce). Now we have the situation we had before modifying the caching tables due to the caching framework and back again - with forgetting some database types...

I there is still something to optimize, please open a new issue.

Updated by Martin Kutschker over 3 years ago

[Reopening just to add this comment]

I reopened it because your comment about serializing is - sorry - nonsense. I explained what happens why data gets corrupted in certain conditions. Maybe they do not apply in all situations (I did not check), but your comment tells me that you (and sadly many more Core devs) simply do not grasp charset handling.

Also available in: Atom PDF