Project

General

Profile

Actions

Feature #80398

closed

Make default charset and collation for new tables configurable

Added by Marco von Arx about 7 years ago. Updated over 5 years ago.

Status:
Closed
Priority:
Should have
Assignee:
-
Category:
Database API (Doctrine DBAL)
Target version:
Start date:
2017-03-22
Due date:
% Done:

100%

Estimated time:
PHP Version:
7.0
Tags:
charset utf8mb4
Complexity:
Sprint Focus:

Description

to be able to store 4 byte unicode characters we need to set database to utf8mb4. since typo3 8 there is a configuration parameter for that but it seems that it is not taken into account.

LocalConfiguration.php

    'DB' => [
        'Connections' => [
            'Default' => [
                'charset' => 'utf8mb4',
                'dbname' => '--dbname--',
                'driver' => 'mysqli',
                'host' => '127.0.0.1',
                'password' => '--mypassword--',
                'port' => 3306,
                'user' => '--myuser--',
            ],
        ],
    ],

create table statements do have a fallback but do not read from configuration

    private function buildTableOptions(array $options)
    {               
        if (isset($options['table_options'])) {
            return $options['table_options'];
        }

        $tableOptions = array();

        // Charset
        if ( ! isset($options['charset'])) {
            $options['charset'] = 'utf8';
        }
        ....
    }

DatabaseConnection class also does not read charset configuration either it takes utf8 as a default.

        $connection = \Doctrine\DBAL\DriverManager::getConnection([
            'driver' => 'mysqli',
            'wrapperClass' => Connection::class,
            'host' => $host,
            'port' => (int)$this->databasePort,
            'unix_socket' => $this->databaseSocket,
            'user' => $this->databaseUsername,
            'password' => $this->databaseUserPassword,
            'charset' => $this->connectionCharset,
        ]);

it was stated that it would be fixed in CMS 8
https://forge.typo3.org/issues/71454

is this on roadmap? before LTS?


Files

typo3-utf8mb4-0.png (104 KB) typo3-utf8mb4-0.png With utf8(mb3) saving fails Lienhart Woitok, 2018-09-12 13:11
typo3-utf8mb4-1.png (233 KB) typo3-utf8mb4-1.png A few emojis (Chrome on ubuntu) Lienhart Woitok, 2018-09-12 13:11
typo3-utf8mb4-2.png (454 KB) typo3-utf8mb4-2.png Same content, Chrome on Android Lienhart Woitok, 2018-09-12 13:11

Related issues 6 (0 open6 closed)

Related to TYPO3 Core - Feature #80659: Set Charset to utf8mb4Closed2017-04-03

Actions
Related to TYPO3 Core - Bug #82551: Upgrade Wizard DeadlockClosed2017-09-25

Actions
Related to TYPO3 Core - Bug #82080: Indexes too large for some tables with utf8mb4Closed2017-08-11

Actions
Related to TYPO3 Core - Feature #71454: Allow setting Connection CharsetClosed2015-11-10

Actions
Related to TYPO3 Core - Bug #86793: Renamed columns are not correctly detected by database schema diffClosed2018-10-30

Actions
Related to TYPO3 Core - Bug #97961: Transform `tableoptions` early to valid `doctrine/dbal` optionClosed2022-07-16

Actions
Actions #1

Updated by Marco von Arx about 7 years ago

the issue is not DatabaseConnection class. charset is read properly from configuration there

it seems that
TYPO3\CMS\Core\Database\Schema\ConnectionMigrator
or TYPO3\CMS\Core\Database\Schema\SchemaMigrator
does not read that configuration parameter

I was able to work arround by adding the following
in TYPO3\CMS\Core\Database\Schema\ConnectionMigrator line 1211

$tableOptions = $table->getOptions();
$connectionParams = $connection->getParams();
if (isset($connectionParams['charset'])) {
$tableOptions['charset'] = $connectionParams['charset'];
}
if (isset($connectionParams['collate'])) {
$tableOptions['collate'] = $connectionParams['collate'];
}
Actions #2

Updated by Morton Jonuschat about 7 years ago

  • Status changed from New to Needs Feedback

Hi!

I think you are mixing two concepts here. Also I think the buildTableOptions() code example is from Doctrine, which is a 3rd-Party Library and has no idea about TYPO3 configuration

1. Connection Charset

This defines the character set the client will use to send SQL statements to the server. It also specifies the character set that the server should use for sending results back to the client. (For example, it indicates what character set to use for column values if you use a SELECT statement.)

2. Storage character set

This defines in which way the Database stores data on disk/in memory. This is controlled by Server/Database/Table/Column options, not the Connection Charset.

If I understand your report correctly you are looking for a way to tell TYPO3 to override the UTF8 default character set (and collation?) for created tables?

Actions #3

Updated by Marco von Arx about 7 years ago

Hi Morton

we need to store 4 Byte Unicode characters like emoji 'http://apps.timwhitlock.info/emoji/tables/unicode'
the default utf8 does only allow storing 3 byte unicode characters. most of emoji characters cannot be stored into utf8.

thats why i need the connection to be utf8mb4 and the database to create tables with charset utf8mb4 and collate utf8mb4_unicode_ci

the first part does indeed work. Typo3 does connect with charset utf8mb4 if i set it in LocalConfiguration.php

but how can I ensure that tables are created with correct charset and collate during setup?

Actions #4

Updated by Morton Jonuschat about 7 years ago

  • Status changed from Needs Feedback to New
  • Priority changed from Must have to Should have
  • Target version set to Candidate for Major Version

This is a new feature which could be implemented for TYPO3 9.0. Doing it using the connectionParameters is not the preferred way as the connection and the tablespace are two different things.
Also this needs to be supported across multiple database connections and database engines.

Actions #5

Updated by Morton Jonuschat about 7 years ago

  • Tracker changed from Bug to Feature
  • Subject changed from connection charset ignored to Make default charset and collation for new tables configurable
Actions #6

Updated by Marco von Arx about 7 years ago

Symfony has separate config parameter for table schemes http://symfony.com/doc/current/doctrine.html

doctrine:
    dbal:
        charset: utf8mb4
        default_table_options:
            charset: utf8mb4
            collate: utf8mb4_unicode_ci

as a suggestion

'DB' => [
'Connections' => [
'Default' => [
'charset' => 'utf8mb4',
'dbname' => '--dbname--',
'driver' => 'mysqli',
'host' => '127.0.0.1',
'password' => '--mypassword--',
'port' => 3306,
'user' => '--myuser--',
'tableoptions' => [
'charset' => 'utf8mb4',
'collate' => 'utf8mb4_unicode_ci'
]
],
],
],
Actions #7

Updated by Tymoteusz Motylewski over 6 years ago

please keep in mind that utf8mb4 uses 4 bytes per char, while "standard" utf8 collation uses 3 bytes per char, which means that indices created might exceed maximum key length limit in mysql.
E.g. by default the key size is 767 which is lower than varchar(255) in utf8, but exceeded with varchar(255) with utf8mb4 (255*4 = 1020)

Actions #8

Updated by Tymoteusz Motylewski over 6 years ago

  • Related to Bug #82551: Upgrade Wizard Deadlock added
Actions #9

Updated by Tymoteusz Motylewski over 6 years ago

  • Related to Bug #82080: Indexes too large for some tables with utf8mb4 added
Actions #10

Updated by Tymoteusz Motylewski over 6 years ago

FYI, MySQL 8 will come with utf8mb4 as default charset

Actions #11

Updated by Gerrit Code Review about 6 years ago

  • Status changed from New to Under Review

Patch set 1 for branch master of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/56440

Actions #12

Updated by Lienhart Woitok about 6 years ago

I have pushed a change to gerrit which implements the config suggestion by Marco von Arx. I'm not entirely sure I found all relevant places to change, but in my tests this worked for the database analyzer in the install tool. Newly created tables are generated with utf8mb4.

Actions #13

Updated by David Henninger almost 6 years ago

  • Has duplicate Bug #85524: Charset for DB Connections in LocalConfiguration.php ignored added
Actions #14

Updated by Riccardo De Contardi almost 6 years ago

Actions #15

Updated by Gerrit Code Review almost 6 years ago

Patch set 2 for branch master of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/56440

Actions #16

Updated by Gerrit Code Review almost 6 years ago

Patch set 3 for branch master of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/56440

Actions #17

Updated by Tymoteusz Motylewski almost 6 years ago

  • Target version changed from Candidate for Major Version to 9 LTS
Actions #18

Updated by Gerrit Code Review almost 6 years ago

Patch set 4 for branch master of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/56440

Actions #19

Updated by Gerrit Code Review almost 6 years ago

Patch set 5 for branch master of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/56440

Actions #20

Updated by Gerrit Code Review almost 6 years ago

Patch set 6 for branch master of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/56440

Actions #21

Updated by Gerrit Code Review almost 6 years ago

Patch set 7 for branch master of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/56440

Actions #22

Updated by Gerrit Code Review almost 6 years ago

Patch set 8 for branch master of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/56440

Actions #23

Updated by Gerrit Code Review almost 6 years ago

Patch set 9 for branch master of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/56440

Actions #24

Updated by Lienhart Woitok almost 6 years ago

  • Status changed from Under Review to Resolved
  • % Done changed from 0 to 100

Updated by Lienhart Woitok almost 6 years ago

As requested by Tymoteusz Motylewski some demonstration screenshots of utf8mb4 support in content (using the introduction package). For the first screenshot with normal utf8 (utf8mb3) I added the heart again to demonstrate the failed content, it wasn't there after saving as it couldn't be written to the database.

Actions #26

Updated by Benni Mack over 5 years ago

  • Status changed from Resolved to Closed
Actions #27

Updated by Helmut Hummel over 5 years ago

  • Related to Bug #86793: Renamed columns are not correctly detected by database schema diff added
Actions #28

Updated by Jeff C about 3 years ago

  • Has duplicate deleted (Bug #85524: Charset for DB Connections in LocalConfiguration.php ignored)
Actions #29

Updated by Stefan Bürk almost 2 years ago

  • Related to Bug #97961: Transform `tableoptions` early to valid `doctrine/dbal` option added
Actions

Also available in: Atom PDF