Project

General

Profile

Actions

Task #54730

closed

Epic #55070: Workpackages

Epic #54260: WP: FAL Missing Issues / Features / API

Story #54266: As an User I want FAL to be performant

Task #51094: SQL-Optimize the FAL

sys_file_processedfile.checksum shorten DB field

Added by Ingo Schmitt over 10 years ago. Updated over 6 years ago.

Status:
Closed
Priority:
Should have
Assignee:
Category:
File Abstraction Layer (FAL)
Target version:
Start date:
2014-01-03
Due date:
% Done:

100%

Estimated time:
TYPO3 Version:
6.2
PHP Version:
Tags:
Complexity:
Sprint Focus:

Description

The contents for sys_file_processedfile.checksum are created by \TYPO3\CMS\Core\Resource\Processing\AbstractTask\getConfigurationChecksum by calling \TYPO3\CMS\Core\Utility\GeneralUtility::shortMD5(implode('|', $this->getChecksumData())).

Since shotMD5 will always return at maximum 10 Characters, the size of the database field could be lowered to 10 characters.

Actions #1

Updated by Steffen Ritter over 10 years ago

  • Parent task set to #51094
Actions #2

Updated by Steffen Ritter over 10 years ago

is there any benefit to use a shortened MD5 instead of a real one? would rather opt to change that.

Actions #3

Updated by Ingo Schmitt over 10 years ago

A short MD5 uses a smaler field size, thus index is smaller and faster.
While reading the code, I had a similar thought, normally we can use the full md5, since the caluclation is fast and the byte size is not the problem at B-Trees, more the differences of keys.
When using a MD5, the isze could be 32 characters, so set size to 32 varchar

Actions #4

Updated by Christoph Dörfel over 10 years ago

Ingo Schmitt wrote:

A short MD5 uses a smaler field size, thus index is smaller and faster.
While reading the code, I had a similar thought, normally we can use the full md5, since the caluclation is fast and the byte size is not the problem at B-Trees, more the differences of keys.
When using a MD5, the isze could be 32 characters, so set size to 32 varchar

While looking through the code yet another idea came to my mind. Why use sha1, md5 or shortMd5 when the only thing we are looking for is "change", not "uniqueness". The generated hashes are not used as unique identifiers but to make sure that data is valid and doesn't have to be recalculated. So we could just use a simple checksum, crc32 in this case. crc32 is used in a lot of cases, where e.g. file transfers have to be checked for errors.
The benefit of using crc32 is that it's a simple 32 bit interger. Comparisons and DB searches can't get faster than that :)
Your opinions?

Edit:
Also in http://forge.typo3.org/issues/54729 with file checksums, "originalfilesha1" and similar.

Actions #5

Updated by Ingo Schmitt over 10 years ago

If this field is used only to detect the "change", than we should name it accordingly. By looking at the field from outside right now, it seams that a file hash is stored. So for an extension developer this field could be used for detecting duplicates or similar.

@Steffen: What do you think about it?

Actions #6

Updated by Mathias Schreiber over 9 years ago

  • Status changed from New to Accepted
  • Target version changed from 6.2.0 to 7.1 (Cleanup)
  • Sprint Focus set to On Location Sprint
Actions #7

Updated by Mathias Schreiber over 9 years ago

  • Category changed from Performance to File Abstraction Layer (FAL)
Actions #8

Updated by Gerrit Code Review about 9 years ago

  • Status changed from Accepted to Under Review

Patch set 1 for branch master of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at http://review.typo3.org/36388

Actions #9

Updated by Gerrit Code Review about 9 years ago

Patch set 1 for branch TYPO3_6-2 of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at http://review.typo3.org/36433

Actions #10

Updated by Mathias Schreiber about 9 years ago

  • Status changed from Under Review to Resolved
  • % Done changed from 0 to 100
Actions #11

Updated by Anja Leichsenring over 8 years ago

  • Sprint Focus deleted (On Location Sprint)
Actions #12

Updated by Riccardo De Contardi over 6 years ago

  • Status changed from Resolved to Closed
Actions

Also available in: Atom PDF