Feature #18501
closedEnable UTF-8 by default
Added by Michael Stucki over 16 years ago. Updated over 13 years ago.
0%
Description
UTF-8 needs to be enabled by default.
(issue imported from #M7942)
Files
utf8_by_default_v2.patch (15.9 KB) utf8_by_default_v2.patch | Administrator Admin, 2010-11-10 10:24 | ||
charset_defaults_v2.diff (16.3 KB) charset_defaults_v2.diff | Administrator Admin, 2010-11-18 01:38 |
Updated by Benni Mack about 14 years ago
What needs to be done in order to have TYPO3 be completely unicode:
- TYPO3 needs to talk UTF-8 all through the core
- The connection to the database needs to be utf-8
Note: It doesn't matter if the DB is UTF-8 or not, because the database only needs to know in which format the data is going to be sent from and to TYPO3 (that is: the connection info). However, we encourage people to make their DB utf-8 by default.
1) We're just talking about the TYPO3 Backend for now, because that's where you usually put data in the database. If a backend user is choosing his language for the backend, TYPO3 takes a character set that it has defined t3lib_cs->charSetArray that fits to the language. so by default english or danish is using "iso-8859-1", russian is using "windows-1251". So far so good. The whole backend is rendered that way and TYPO3 is also using the chosen character set in order to save it to the database. This is getting a real mess if you have a backend user that speaks "english" and another that speaks russian, because then there are datasets with different character sets in the DB!!! Anyway, the famous [UTF-8][forceCharset] tells TYPO3 to always use "utf-8" (or something else) and not use t3lib_cs->charsetArray for that. This means: forceCharset allows TYPO3 to speak one charset regardless of what language a BE user has set.
2) The UTF-8 connection is determined through the database. In MySQL this can be set in the server connection (character_set_connection), but can also be overriden by sending "SET NAMES utf8" with every connection establishment.
Imagine some evil setups:
- No forceCharset is set, so multiple users with different languages (that have different charsets in t3lib_cs->charsetArray) read and write datasets, even the same datasets. This is chaos.
- forceCharset is set, so TYPO3 always reads and writes data in utf-8, which is cool. However, if the DB connection is not set, or the DB server is configured so the connection is "latin1" by default, DB thinks the UTF-8 data that TYPO3 sends is "latin1", and then re-converts it to UTF-8 (if the DB is utf-8), or just stores the data as it is in the DB. This actually works and is no problem, AS LONG AS you don't change the DB connection to UTF-8, which would result in a mixed setup within the DB once you read and write again. Here you need a manual upgrade of your DB, some infos can be found in BT issue #18686 (http://bugs.typo3.org/view.php?id=8227)
These are cases where the TYPO3 installation is messed up big time, and require a lot of work to change.
Advantages by having UTF-8 by default:
- If your FE speaks UTF-8 by default as well, no charset conversion is needed anymore, which will speed up the whole rendering process.
- Having everything with UTF-8 allows a better transition to v5 (don't know how this will look like, but we know UTF-8 is better than any mixed setups :))
So. The attached patch does this:
Deprecation of any other character set than UTF-8. For two versions the installation can run in other setup, but in 4.7, the option "forceCharset" will go, because it should always be utf-8 anyways. Additionally "multiplyDBfieldSize" should have been deprecated for a long time.
A) config_default.php
First, the two important parameters "forceCharset" and "setDBinit" are set to "-1", because we need to find out if the parameter was changed in localconf.php or if the installation still uses the original default setting. So, if the options are still "-1" after the inclusion of localconf.php, the installation uses the default setup and has not modified anything. It is checked if the site has been upgraded already (compat_version) 4.5. If the site has been upgraded to 4.5 through the upgrade wizard, the user is on his own.
The whole code in config_default.php could be dropped again in 4.8 when migration is done for all installation (dunno yet).
B) Helper function in t3lib_db.php to determine if the current connection is UTF-8. This is useful because this can happen through the server configuration or overriden via setDBinit.
C) When installing TYPO3 through the 1-2-3 installer create the new database with UTF-8 by default.
D) Small change in the update wizard code in order to allow some displaying information without having to show the "next" button all the time. Helpful to let people know what their setup is.
E) Upgrade wizard, that shows the information about the current information and a link for a tutorial that explains complex scenarios and how people could upgrade their Backend + DB to UTF-8. We discourage people to have an automated way for doing this.
TYPO3 thinks the site has been completely upgraded if:
- forceCharset has been unset in your localconf.php
- AND compat_version is set to 4.5
Thanks to Michael Stucki for getting this on the way and explaining everything. Thanks to Tolleiv Nietsch for testing the patch.
Updated by Christian Kuhn about 14 years ago
Committed by Stucki to trunk rev. 9481 for 4.5 beta1