Bug #21569
closed
Wrong character encoding in cache tables breaks frontend rendering
Added by Steffen Kamper about 15 years ago.
Updated over 14 years ago.
Description
There are several issues where cache data makes problems, eg if renderCharset != metaCharset. Then it happens if the cache data has special char like an Umlaut, the returned data is corrupted.
Reason is that cache-table use field TEXT for serialized arrays, therfore MySQL respect the charset.
Solution: use BLOB instead
(issue imported from #M12613)
Files
In cache_pages the field HTML stores the complete page. Yet this must be also a BLOB since Mysql (and other DBs) take it ill if the sent data is not in the charset of the column.
To be precise: if you have the DB in utf-8 the content will be truncated at the first byte that is invalid in utf8.
What exactly do you mean by having the DB in utf-8. Today I tried setting the collation of a database to utf-8 and also the collation of tables, but for some reason I could not reproduce the this case. I know it happens, but I would like to know under which circumstances.
Which settings do I have to make to the database to let this case happen?
Ok. Just tested the description of #17053 which seems to be the same problem.
Back in these days (2007) for some reason the "content" field of cache_hash got changed from "mediumblob" to "mediumtext" which seemed to introduce this error.
I could reproduce the error using Michaels bug note 0013133 in bug #17053.
Changing the field "content" in table "cache_hash" back from mediumtext to mediumblob solved the problem for me.
I applied your patch and it works!
But one note:
CREATE TABLE cache_pagesection (
page_id int(11) unsigned DEFAULT '0' NOT NULL,
mpvar_hash int(11) unsigned DEFAULT '0' NOT NULL,
- content text,
+ content mediumblob, <--------------- shouldn't this be blob, not mediumblob ?
tstamp int(11) unsigned DEFAULT '0' NOT NULL,
PRIMARY KEY (page_id,mpvar_hash)
) ENGINE=InnoDB;
That was the situation when the caching framework was introduced:
http://forge.typo3.org/repositories/diff/typo3v4-core?rev=4336
The tables of the caching framework can stay as they are (with "TEXT") since the caching framework performs an additional serialize() before writing to the database.
The only tables that have to be changed are cache_hash, cache_pages and cache_pagesection.
committed in trunk, rev 6525
Serializing does not help if you write iso-8859-1/latin1 (or any other charset) into a utf.8 field. The data will be TRUNCATED (!) at the first character that is not valid in utf-8.
This is similar as with iconv (maybe Mysql uses it). iconv stops any operation when it encounters invalid input,
Masi, why did you reopen this issue again?
Could not reproduce with the caching framework since the data gets serialized twice there...
I tested it with regular caching (I could reproduce the bad behaviour) and with caching framework (I could not reproduce). Now we have the situation we had before modifying the caching tables due to the caching framework and back again - with forgetting some database types...
I there is still something to optimize, please open a new issue.
[Reopening just to add this comment]
I reopened it because your comment about serializing is - sorry - nonsense. I explained what happens why data gets corrupted in certain conditions. Maybe they do not apply in all situations (I did not check), but your comment tells me that you (and sadly many more Core devs) simply do not grasp charset handling.
Also available in: Atom
PDF