Bug #78149

Change field definition for caption and copyright in sys_file_metadata

Added by Thomas Hohn over 4 years ago. Updated over 2 years ago.

Status:
Closed
Priority:
Should have
Assignee:
-
Category:
-
Target version:
Start date:
2016-10-05
Due date:
% Done:

100%

Estimated time:
TYPO3 Version:
7
PHP Version:
Tags:
Complexity:
no-brainer
Is Regression:
No
Sprint Focus:

Description

In the moderne publishing business and probably in other areas the
definition of caption (https://github.com/TYPO3/TYPO3.CMS/blob/master/typo3/sysext/filemetadata/ext_tables.sql#L8) and copyright (https://github.com/TYPO3/TYPO3.CMS/blob/master/typo3/sysext/filemetadata/ext_tables.sql#L14) as varchar(255) is to small.

In addition filemeta data from for instance iStock and other image providers
are often langer than 255 characters.

We propose to change the definition to a text instead since this extension
is also required when using Solr as search engine.

CREATE TABLE sys_file_metadata (
    ...
    caption text,
    ...
    copyright text,
    ...
);    

With kind regards

Thomas


Related issues

Related to TYPO3 Core - Bug #77970: Indexing external files with path longer than 255 charsClosed2016-09-16

Actions
Related to TYPO3 Core - Feature #52719: Advanced metadata for FileClosed2013-10-12

Actions
#1

Updated by Thomas Hohn over 4 years ago

  • Complexity set to no-brainer
#2

Updated by Thomas Hohn over 4 years ago

  • Description updated (diff)
#3

Updated by Stephan Großberndt over 4 years ago

TEXT is not stored inside the table itself making it slower if not needed. Most people won't need more than 255 chars for a title (TEXT is up to 65536)
I'd propose to make it a TEXT in an own extension if needed - which EXT:solr should do too if it needs that.

#4

Updated by Kasper Ligaard over 4 years ago

It is not Solr which needs it a bigger field, it is our users who have used more than 255; an example can be seen in our site on Copenhagen in late 1700, https://kbh.systime.dk/index.php?id=122 (several examples can be found on that website alone).

Note that TYPO3 8 is switching to doctrine DBAL, so assuming MySQL storage techniques for the database can not be an argument in itself; also note that storing UTF-8 characters - the default encoding in TYPO3 - might reduce to even fewer characters being stored since VARCHAR is byte oriented whereas UTF-8 contains multi-byte characters.

That said: If we talk MySQL specifically, then MySQL 5.5 and later allows VARCHAR up to 65535:

So I propose we change from VARCHAR(255) to VARCHAR(65535). This addresses Stephans performance concerns while also accomodating the needs we meet with real users and using TYPO3.

#5

Updated by Thomas Hohn over 4 years ago

  • Target version set to Candidate for patchlevel
#6

Updated by Kasper Ligaard over 4 years ago

TYPO3 aspires to be used outside Northern Europe and USA (there are TYPO3 user groups in India, Vietnam and Cambodia). In those languages VARCHAR(255) is constraining, since East Asian languages can reach three bytes per character

Even for Western languages the avg. bytes per character usage may reach 1.5. For Greek the bytes per character approaches 2. So Greeks are basically allowed little less than a Tweet and East Asians not even close to a Tweet...

But leaving out the above considerations for now: We have the problem even in a Western Language, simply because a caption might need to describe the image, picture or illustration and a copyright description might be more than 255 bytes long.

#7

Updated by Thomas Hohn over 4 years ago

I think a varchar(4000) could be sufficient like also used in the be_users table :-) instead of a TEXT

#8

Updated by Kasper Ligaard over 4 years ago

This issue is very similar to Issue #77970 (which has just been merged, https://review.typo3.org/#/c/50144/ by Tymoteusz Motylewski):
  • The default length of 255 is inadequate.
  • It is in an extension in TYPO3 Core that is optional to install
  • It is an enlargement of the field.
  • It makes TYPO3 better :-)

Setting Issue #77970 as related and adding Tymoteusz Motylewski as watcher in hope of getting his opinion.

Changing tracker status to Bug to match issue #77970.

#9

Updated by Kasper Ligaard over 4 years ago

  • Tracker changed from Feature to Bug
  • Target version changed from Candidate for patchlevel to next-patchlevel
  • TYPO3 Version set to 7
  • Is Regression set to No
#10

Updated by Kasper Ligaard over 4 years ago

For those who do not remember, the official TYPO3 reasoning for introducing EXT:filemetadata in TYPO 6.2, was to ensure interoperability of file metadata. From issue #52719:

The main benefit of this approach is to have unified metadata in TYPO3. As a result, third-party extensions can build upon a common base preventing everyone comes up with each one set of metadata.

So the point of having an optional extension like EXT:filemetadata is that people who need these fields can actually use it. Having the field sizes for caption and copyright confined to 255 chars defeats both the purpose and intention of this extension.

We have had our own metadata extension and only ran in to this problem because we wanted to use the core provided extension. Maybe others with a professionel need for captions and copyright would also start using EXT:filemetadata.

#11

Updated by Kasper Ligaard over 4 years ago

Adding Xavier Perseguers as watcher, since the Extractor extension uses a copyright field of just 255 chars in it's ext_tables.sql.

Ref.: Extractor, https://typo3.org/extensions/repository/view/extractor

#12

Updated by Claus Due over 4 years ago

I'd like to address the counter-arguments before pushing a merge request:

TEXT is not stored inside the table itself making it slower if not needed.

Actually the opposite is true for every SQL query that does not explicitly include the field that is TEXT type. The bytes are stored in a parallel table.

This in turn means that turning it into a TEXT type will actually increase performance when the field is not included.

In both bases, the impact to performance is marginal, so low in impact that it will be unlikely noticable.

Most importantly: there are no keys or indices present on either of these values. The related field `keywords` is much more of a candidate for querying - and it is already a TEXT type exactly to support longer keyword lists (note: if performance was a concern here, VARCHAR or larger would be much more appropriate for that field - but TEXT is still a better type for our caption and copyright fields due to their different nature).

So that should cover the performance concerns.

Most people won't need more than 255 chars for a title (TEXT is up to 65536)

This may be true but since the performance argument is not weighty enough (in my opinion), a broader support in the default configuration would be preferable. If nothing else, in Systime we have a client with hundreds of live use cases which would benefit from this change. Although the source is a single agency, the number of individual use cases is significant.

I'd propose to make it a TEXT in an own extension if needed - which EXT:solr should do too if it needs that.

To counter this argument: if we do so, and the admin uninstalls that extension at some point, file metadata gets truncated to the maximum of 255 chars without you noticing this (install tool does not warn about values that will be truncated, but it does warn that the field type will change - that is just not enough to make it "safe"). If we chose TEXT as our default type and some extension redeclares it as VARCHAR the opposite is true: the extension would truncate the fields (NB: Xavier, I know you're reading here and it seems you might be affected?) and uninstalling it would have no impact.

When third-party extensions declare this field as TEXT the schema update will be ignored and uninstalling the extension is no longer potentially destructive for these fields.

All in all that means it is the safer option to increase this field size in the core. The only risk this brings is the combination of:

  • Using a version of TYPO3 where this field size is increased
  • Not currently using the third party extension which would decrease the field size
  • Having content in those fields exceeding 255
  • With all the above in place, decides to install the extension that decreases the field size. Result: field value truncated.

Whereas any site which uses EXT:solr or any other extension that increases the field size (including internal ones) will no longer be susceptible to truncating field values when they are uninstalled.

All things being equal, it would be safer to affect the edge case above negatively (and only until the third party extension is migrated!) rather than keeping the inadequate field size and putting it on third-party extensions to increase it and so risk truncation if uninstalled.

Merge request is incoming!

#13

Updated by Gerrit Code Review over 4 years ago

  • Status changed from New to Under Review

Patch set 1 for branch master of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/50248

#14

Updated by Xavier Perseguers over 4 years ago

For the record, I defined copyright with 255 chars to be large enough whereas http://www.controlledvocabulary.com/imagedatabases/iptc_naa.html seems to even shorten that to 128 bytes for IPTC.

#15

Updated by Gerrit Code Review over 4 years ago

Patch set 2 for branch master of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/50248

#16

Updated by Gerrit Code Review about 4 years ago

Patch set 3 for branch master of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/50248

#17

Updated by Gerrit Code Review about 4 years ago

Patch set 4 for branch master of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/50248

#18

Updated by Gerrit Code Review about 4 years ago

Patch set 1 for branch TYPO3_7-6 of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/50331

#19

Updated by Anonymous about 4 years ago

  • Status changed from Under Review to Resolved
  • % Done changed from 0 to 100
#20

Updated by Benni Mack over 2 years ago

  • Status changed from Resolved to Closed

Also available in: Atom PDF