Task #54730
closed
Added by Ingo Schmitt almost 11 years ago.
Updated about 7 years ago.
Category:
File Abstraction Layer (FAL)
Description
The contents for sys_file_processedfile.checksum are created by \TYPO3\CMS\Core\Resource\Processing\AbstractTask\getConfigurationChecksum by calling \TYPO3\CMS\Core\Utility\GeneralUtility::shortMD5(implode('|', $this->getChecksumData())).
Since shotMD5 will always return at maximum 10 Characters, the size of the database field could be lowered to 10 characters.
- Parent task set to #51094
is there any benefit to use a shortened MD5 instead of a real one? would rather opt to change that.
A short MD5 uses a smaler field size, thus index is smaller and faster.
While reading the code, I had a similar thought, normally we can use the full md5, since the caluclation is fast and the byte size is not the problem at B-Trees, more the differences of keys.
When using a MD5, the isze could be 32 characters, so set size to 32 varchar
Ingo Schmitt wrote:
A short MD5 uses a smaler field size, thus index is smaller and faster.
While reading the code, I had a similar thought, normally we can use the full md5, since the caluclation is fast and the byte size is not the problem at B-Trees, more the differences of keys.
When using a MD5, the isze could be 32 characters, so set size to 32 varchar
While looking through the code yet another idea came to my mind. Why use sha1, md5 or shortMd5 when the only thing we are looking for is "change", not "uniqueness". The generated hashes are not used as unique identifiers but to make sure that data is valid and doesn't have to be recalculated. So we could just use a simple checksum, crc32 in this case. crc32 is used in a lot of cases, where e.g. file transfers have to be checked for errors.
The benefit of using crc32 is that it's a simple 32 bit interger. Comparisons and DB searches can't get faster than that :)
Your opinions?
Edit:
Also in http://forge.typo3.org/issues/54729 with file checksums, "originalfilesha1" and similar.
If this field is used only to detect the "change", than we should name it accordingly. By looking at the field from outside right now, it seams that a file hash is stored. So for an extension developer this field could be used for detecting duplicates or similar.
@Steffen: What do you think about it?
- Status changed from New to Accepted
- Target version changed from 6.2.0 to 7.1 (Cleanup)
- Sprint Focus set to On Location Sprint
- Category changed from Performance to File Abstraction Layer (FAL)
- Status changed from Accepted to Under Review
Patch set 1 for branch master of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at http://review.typo3.org/36388
Patch set 1 for branch TYPO3_6-2 of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at http://review.typo3.org/36433
- Status changed from Under Review to Resolved
- % Done changed from 0 to 100
- Sprint Focus deleted (
On Location Sprint)
- Status changed from Resolved to Closed
Also available in: Atom
PDF