Feature #80398

Make default charset and collation for new tables configurable

Added by Marco von Arx over 2 years ago. Updated about 1 year ago.

Status:
Closed
Priority:
Should have
Assignee:
-
Category:
Database API (Doctrine DBAL)
Target version:
Start date:
2017-03-22
Due date:
% Done:

100%

PHP Version:
7.0
Tags:
charset utf8mb4
Complexity:
Sprint Focus:

Description

to be able to store 4 byte unicode characters we need to set database to utf8mb4. since typo3 8 there is a configuration parameter for that but it seems that it is not taken into account.

LocalConfiguration.php

    'DB' => [
        'Connections' => [
            'Default' => [
                'charset' => 'utf8mb4',
                'dbname' => '--dbname--',
                'driver' => 'mysqli',
                'host' => '127.0.0.1',
                'password' => '--mypassword--',
                'port' => 3306,
                'user' => '--myuser--',
            ],
        ],
    ],

create table statements do have a fallback but do not read from configuration

    private function buildTableOptions(array $options)
    {               
        if (isset($options['table_options'])) {
            return $options['table_options'];
        }

        $tableOptions = array();

        // Charset
        if ( ! isset($options['charset'])) {
            $options['charset'] = 'utf8';
        }
        ....
    }

DatabaseConnection class also does not read charset configuration either it takes utf8 as a default.

        $connection = \Doctrine\DBAL\DriverManager::getConnection([
            'driver' => 'mysqli',
            'wrapperClass' => Connection::class,
            'host' => $host,
            'port' => (int)$this->databasePort,
            'unix_socket' => $this->databaseSocket,
            'user' => $this->databaseUsername,
            'password' => $this->databaseUserPassword,
            'charset' => $this->connectionCharset,
        ]);

it was stated that it would be fixed in CMS 8
https://forge.typo3.org/issues/71454

is this on roadmap? before LTS?

typo3-utf8mb4-0.png View - With utf8(mb3) saving fails (104 KB) Lienhart Woitok, 2018-09-12 13:11

typo3-utf8mb4-1.png View - A few emojis (Chrome on ubuntu) (233 KB) Lienhart Woitok, 2018-09-12 13:11

typo3-utf8mb4-2.png View - Same content, Chrome on Android (454 KB) Lienhart Woitok, 2018-09-12 13:11


Related issues

Related to TYPO3 Core - Feature #80659: Set Charset to utf8mb4 Closed 2017-04-03
Related to TYPO3 Core - Bug #82551: Upgrade Wizard Deadlock Closed 2017-09-25
Related to TYPO3 Core - Bug #82080: Indexes too large for some tables with utf8mb4 Closed 2017-08-11
Related to TYPO3 Core - Feature #71454: Allow setting Connection Charset Closed 2015-11-10
Related to TYPO3 Core - Bug #86793: Renamed columns are not correctly detected by database schema diff Closed 2018-10-30
Duplicated by TYPO3 Core - Bug #85524: Charset for DB Connections in LocalConfiguration.php ignored Closed 2018-07-09

Associated revisions

Revision ed806ef5 (diff)
Added by Lienhart Woitok about 1 year ago

[FEATURE] Use utf8mb4 on mysql for new instances

If installing a new TYPO3 instance on mysql, utf8mb4 is now used as
default charset for the database connection and as default collation.

Upgraders may change LocalConfiguration to use utf8mb4, too. They
however need to take care of changing their collations and setting
according table detaults on their own.

A reports status check verifies there is no mixed collation.

Resolves: #80398
Resolves: #82080
Resolves: #82551
Releases: master
Change-Id: I6bf464a22c6ed74631bf5aacff9c2cfe670077da
Reviewed-on: https://review.typo3.org/56440
Reviewed-by: Christian Kuhn <>
Tested-by: Christian Kuhn <>
Tested-by: TYPO3com <>
Reviewed-by: Lienhart Woitok <>
Tested-by: Lienhart Woitok <>
Reviewed-by: Georg Großberger <>
Reviewed-by: Jigal van Hemert <>
Tested-by: Jigal van Hemert <>

History

#1 Updated by Marco von Arx over 2 years ago

the issue is not DatabaseConnection class. charset is read properly from configuration there

it seems that
TYPO3\CMS\Core\Database\Schema\ConnectionMigrator
or TYPO3\CMS\Core\Database\Schema\SchemaMigrator
does not read that configuration parameter

I was able to work arround by adding the following
in TYPO3\CMS\Core\Database\Schema\ConnectionMigrator line 1211

$tableOptions = $table->getOptions();
$connectionParams = $connection->getParams();
if (isset($connectionParams['charset'])) {
$tableOptions['charset'] = $connectionParams['charset'];
}
if (isset($connectionParams['collate'])) {
$tableOptions['collate'] = $connectionParams['collate'];
}

#2 Updated by Morton Jonuschat over 2 years ago

  • Status changed from New to Needs Feedback

Hi!

I think you are mixing two concepts here. Also I think the buildTableOptions() code example is from Doctrine, which is a 3rd-Party Library and has no idea about TYPO3 configuration

1. Connection Charset

This defines the character set the client will use to send SQL statements to the server. It also specifies the character set that the server should use for sending results back to the client. (For example, it indicates what character set to use for column values if you use a SELECT statement.)

2. Storage character set

This defines in which way the Database stores data on disk/in memory. This is controlled by Server/Database/Table/Column options, not the Connection Charset.

If I understand your report correctly you are looking for a way to tell TYPO3 to override the UTF8 default character set (and collation?) for created tables?

#3 Updated by Marco von Arx over 2 years ago

Hi Morton

we need to store 4 Byte Unicode characters like emoji 'http://apps.timwhitlock.info/emoji/tables/unicode'
the default utf8 does only allow storing 3 byte unicode characters. most of emoji characters cannot be stored into utf8.

thats why i need the connection to be utf8mb4 and the database to create tables with charset utf8mb4 and collate utf8mb4_unicode_ci

the first part does indeed work. Typo3 does connect with charset utf8mb4 if i set it in LocalConfiguration.php

but how can I ensure that tables are created with correct charset and collate during setup?

#4 Updated by Morton Jonuschat over 2 years ago

  • Status changed from Needs Feedback to New
  • Priority changed from Must have to Should have
  • Target version set to Candidate for Major Version

This is a new feature which could be implemented for TYPO3 9.0. Doing it using the connectionParameters is not the preferred way as the connection and the tablespace are two different things.
Also this needs to be supported across multiple database connections and database engines.

#5 Updated by Morton Jonuschat over 2 years ago

  • Tracker changed from Bug to Feature
  • Subject changed from connection charset ignored to Make default charset and collation for new tables configurable

#6 Updated by Marco von Arx over 2 years ago

Symfony has separate config parameter for table schemes http://symfony.com/doc/current/doctrine.html

doctrine:
    dbal:
        charset: utf8mb4
        default_table_options:
            charset: utf8mb4
            collate: utf8mb4_unicode_ci

as a suggestion

'DB' => [
'Connections' => [
'Default' => [
'charset' => 'utf8mb4',
'dbname' => '--dbname--',
'driver' => 'mysqli',
'host' => '127.0.0.1',
'password' => '--mypassword--',
'port' => 3306,
'user' => '--myuser--',
'tableoptions' => [
'charset' => 'utf8mb4',
'collate' => 'utf8mb4_unicode_ci'
]
],
],
],

#7 Updated by Tymoteusz Motylewski over 1 year ago

please keep in mind that utf8mb4 uses 4 bytes per char, while "standard" utf8 collation uses 3 bytes per char, which means that indices created might exceed maximum key length limit in mysql.
E.g. by default the key size is 767 which is lower than varchar(255) in utf8, but exceeded with varchar(255) with utf8mb4 (255*4 = 1020)

#8 Updated by Tymoteusz Motylewski over 1 year ago

  • Related to Bug #82551: Upgrade Wizard Deadlock added

#9 Updated by Tymoteusz Motylewski over 1 year ago

  • Related to Bug #82080: Indexes too large for some tables with utf8mb4 added

#10 Updated by Tymoteusz Motylewski over 1 year ago

FYI, MySQL 8 will come with utf8mb4 as default charset

#11 Updated by Gerrit Code Review over 1 year ago

  • Status changed from New to Under Review

Patch set 1 for branch master of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/56440

#12 Updated by Lienhart Woitok over 1 year ago

I have pushed a change to gerrit which implements the config suggestion by Marco von Arx. I'm not entirely sure I found all relevant places to change, but in my tests this worked for the database analyzer in the install tool. Newly created tables are generated with utf8mb4.

#13 Updated by David Henninger over 1 year ago

  • Duplicated by Bug #85524: Charset for DB Connections in LocalConfiguration.php ignored added

#14 Updated by Riccardo De Contardi over 1 year ago

#15 Updated by Gerrit Code Review about 1 year ago

Patch set 2 for branch master of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/56440

#16 Updated by Gerrit Code Review about 1 year ago

Patch set 3 for branch master of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/56440

#17 Updated by Tymoteusz Motylewski about 1 year ago

  • Target version changed from Candidate for Major Version to 9 LTS

#18 Updated by Gerrit Code Review about 1 year ago

Patch set 4 for branch master of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/56440

#19 Updated by Gerrit Code Review about 1 year ago

Patch set 5 for branch master of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/56440

#20 Updated by Gerrit Code Review about 1 year ago

Patch set 6 for branch master of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/56440

#21 Updated by Gerrit Code Review about 1 year ago

Patch set 7 for branch master of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/56440

#22 Updated by Gerrit Code Review about 1 year ago

Patch set 8 for branch master of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/56440

#23 Updated by Gerrit Code Review about 1 year ago

Patch set 9 for branch master of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/56440

#24 Updated by Lienhart Woitok about 1 year ago

  • Status changed from Under Review to Resolved
  • % Done changed from 0 to 100

#25 Updated by Lienhart Woitok about 1 year ago

As requested by Tymoteusz Motylewski some demonstration screenshots of utf8mb4 support in content (using the introduction package). For the first screenshot with normal utf8 (utf8mb3) I added the heart again to demonstrate the failed content, it wasn't there after saving as it couldn't be written to the database.

#26 Updated by Benni Mack about 1 year ago

  • Status changed from Resolved to Closed

#27 Updated by Helmut Hummel 12 months ago

  • Related to Bug #86793: Renamed columns are not correctly detected by database schema diff added

Also available in: Atom PDF