Feature #43233

FAL Migration should consolidate duplicated files

Added by Marcel Burkhalter almost 9 years ago. Updated over 1 year ago.

Status:
Closed
Priority:
Should have
Assignee:
-
Category:
File Abstraction Layer (FAL)
Target version:
-
Start date:
2012-11-22
Due date:
% Done:

0%

Estimated time:
PHP Version:
Tags:
Complexity:
Sprint Focus:
Needs Decision

Description

TYPO3 versions prior to FAL made copies to /uploads/ for each integration of a file. Since these copies are numbered and FAL calculates a SHA1 hash over the file contents it should be possible to consolidate these files when they are copied to the /fileadmin/_migrated/ folder.
Otherwise an upgraded installation has new files with meaningful FAL referece counts and migrated files with lots of duplicates who all have a reference count of 1. With an "intelligent" FAL migration upgraded installations could take full advantage (for all content/files) of nice FAL features such as central update of files / space savings etc.

#1

Updated by Alexander Opitz about 7 years ago

  • Project changed from 1401 to TYPO3 Core
  • Category set to File Abstraction Layer (FAL)
  • Target version deleted (6.1)
  • TYPO3 Version set to 6.0
#2

Updated by Susanne Moog about 6 years ago

  • Sprint Focus set to PRC
#3

Updated by Georg Ringer about 4 years ago

  • Status changed from New to Rejected

as the FAL migration was part of 6.2 which i EOL there won't be anything done in that area anymore. Therefore I am closing this issue even though it is still valid.

#4

Updated by Georg Ringer about 4 years ago

  • Status changed from Rejected to Closed
#5

Updated by Dirk Klimpel over 3 years ago

This is an old thread but I had same problem / request.
I have builded a solution.

# table with sha1 hashes of all files and
# how often occur
CREATE TEMPORARY TABLE IF NOT EXISTS 
    temp_table_sha1 ( index(sha1) ) 
ENGINE=MyISAM 
AS (
    SELECT
        sys_file.sha1,
        count( sys_file.sha1 ) AS anz
    FROM
        sys_file
    GROUP BY
        sys_file.sha1
);

# table of all files to migrate (source)
# when file exists more then one times
# and file is saved in folder "_migrated" 
CREATE TEMPORARY TABLE IF NOT EXISTS 
    temp_table_src ( index(uid), key(sha1) ) 
ENGINE=MyISAM 
AS (
    SELECT
        sys_file.uid,
        sys_file.sha1
    FROM
        sys_file
        INNER JOIN temp_table_sha1 ON sys_file.sha1 = temp_table_sha1.sha1
    WHERE
        temp_table_sha1.anz > 1
        AND sys_file.identifier LIKE '/_migrated/%'
    ORDER BY
        sys_file.uid
);

# table of all files of orign
# when file exists more then one times, is not missing
# and file is not saved in folder "_migrated" or "uploads" or "templates" 
CREATE TEMPORARY TABLE IF NOT EXISTS 
    temp_table_dst ( INDEX(sha1) ) 
ENGINE=MyISAM 
AS (
    SELECT
        sys_file.uid,
        sys_file.sha1
    FROM
        sys_file
        INNER JOIN temp_table_sha1 ON sys_file.sha1 = temp_table_sha1.sha1
    WHERE
        temp_table_sha1.anz > 1
        AND sys_file.identifier NOT RLIKE '/_migrated/.*|/uploads/.*|/templates/.*'
        AND sys_file.missing = 0
    GROUP BY
        sys_file.sha1
);

# create backup
CREATE TABLE sys_file_reference_bak LIKE sys_file_reference; 
INSERT sys_file_reference_bak SELECT * FROM sys_file_reference;

# update reference table
# join sys_file_reference.uid_local -> temp_table_src.uid - temp_table_src.sha1 -> temp_table_dst.sha1 - sys_file_reference.uid_local
# replace uid of old files (temp_table_src) with uid of new files (temp_table_dst)
# matching with same sha1 hash
UPDATE
    sys_file_reference
INNER JOIN
    temp_table_src
    ON temp_table_src.uid = sys_file_reference.uid_local
INNER JOIN
    temp_table_dst
    ON temp_table_dst.sha1 = temp_table_src.sha1
SET
    sys_file_reference.uid_local = temp_table_dst.uid
WHERE
    sys_file_reference.table_local = 'sys_file';

# show the changes with help of backup table
SELECT *
FROM sys_file_reference
INNER JOIN sys_file_reference_bak ON sys_file_reference_bak.uid = sys_file_reference.uid
WHERE sys_file_reference_bak.uid_local <> sys_file_reference.uid_local;

After that you have to check / update the reference index.
You can delete the old files with the FAL Explorer in folder "_migrated" which have no reference anymore, now.

#6

Updated by Benni Mack over 1 year ago

  • Sprint Focus changed from PRC to Needs Decision

Also available in: Atom PDF