Task #54730
closedEpic #55070: Workpackages
Epic #54260: WP: FAL Missing Issues / Features / API
Story #54266: As an User I want FAL to be performant
Task #51094: SQL-Optimize the FAL
sys_file_processedfile.checksum shorten DB field
100%
Description
The contents for sys_file_processedfile.checksum are created by \TYPO3\CMS\Core\Resource\Processing\AbstractTask\getConfigurationChecksum by calling \TYPO3\CMS\Core\Utility\GeneralUtility::shortMD5(implode('|', $this->getChecksumData())).
Since shotMD5 will always return at maximum 10 Characters, the size of the database field could be lowered to 10 characters.
Updated by Steffen Ritter about 11 years ago
is there any benefit to use a shortened MD5 instead of a real one? would rather opt to change that.
Updated by Ingo Schmitt about 11 years ago
A short MD5 uses a smaler field size, thus index is smaller and faster.
While reading the code, I had a similar thought, normally we can use the full md5, since the caluclation is fast and the byte size is not the problem at B-Trees, more the differences of keys.
When using a MD5, the isze could be 32 characters, so set size to 32 varchar
Updated by Christoph Dörfel about 11 years ago
Ingo Schmitt wrote:
A short MD5 uses a smaler field size, thus index is smaller and faster.
While reading the code, I had a similar thought, normally we can use the full md5, since the caluclation is fast and the byte size is not the problem at B-Trees, more the differences of keys.
When using a MD5, the isze could be 32 characters, so set size to 32 varchar
While looking through the code yet another idea came to my mind. Why use sha1, md5 or shortMd5 when the only thing we are looking for is "change", not "uniqueness". The generated hashes are not used as unique identifiers but to make sure that data is valid and doesn't have to be recalculated. So we could just use a simple checksum, crc32 in this case. crc32 is used in a lot of cases, where e.g. file transfers have to be checked for errors.
The benefit of using crc32 is that it's a simple 32 bit interger. Comparisons and DB searches can't get faster than that :)
Your opinions?
Edit:
Also in http://forge.typo3.org/issues/54729 with file checksums, "originalfilesha1" and similar.
Updated by Ingo Schmitt about 11 years ago
If this field is used only to detect the "change", than we should name it accordingly. By looking at the field from outside right now, it seams that a file hash is stored. So for an extension developer this field could be used for detecting duplicates or similar.
@Steffen: What do you think about it?
Updated by Mathias Schreiber about 10 years ago
- Status changed from New to Accepted
- Target version changed from 6.2.0 to 7.1 (Cleanup)
- Sprint Focus set to On Location Sprint
Updated by Mathias Schreiber about 10 years ago
- Category changed from Performance to File Abstraction Layer (FAL)
Updated by Gerrit Code Review about 10 years ago
- Status changed from Accepted to Under Review
Patch set 1 for branch master of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at http://review.typo3.org/36388
Updated by Gerrit Code Review about 10 years ago
Patch set 1 for branch TYPO3_6-2 of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at http://review.typo3.org/36433
Updated by Mathias Schreiber about 10 years ago
- Status changed from Under Review to Resolved
- % Done changed from 0 to 100
Applied in changeset fd19c5221722c4ed56a4b67b149c7d049e2d1189.
Updated by Anja Leichsenring about 9 years ago
- Sprint Focus deleted (
On Location Sprint)
Updated by Riccardo De Contardi over 7 years ago
- Status changed from Resolved to Closed