Project

General

Profile

Actions

Feature #97671

open

File versioning in query string

Added by Christian Toffolo almost 2 years ago. Updated over 1 year ago.

Status:
New
Priority:
Should have
Assignee:
-
Category:
Frontend
Start date:
2022-05-22
Due date:
% Done:

0%

Estimated time:
PHP Version:
Tags:
Complexity:
Sprint Focus:

Description

File versioning in query string

If configured, automatically add a query string parameter to files URI in HTML. The string could be an MD5 hash generated from file content.
Example: https://domain.ext/fileadmin/logo.svg?4857383

Purpose of this feature

TLDR: improve SEO and browser's cache control.

It's good practice to serve resources that don't change often with a far-future expiration date.
Examples:
  • ExpiresByType text/css "access plus 1 year"
  • ExpiresByType text/javascript "access plus 1 year"
  • ExpiresByType image/jpeg "access plus 1 month"
  • ExpiresByType image/svg+xml "access plus 1 month"
  • ExpiresByType image/webp "access plus 1 month"
  • ExpiresByType font/woff2 "access plus 1 month"

If these resources change and we want that visitors' browsers get the new version, we have to manually change the file name.

TYPO3 processed files will automatically use a filename based on the file hash and configuration.
This current feature solves the browser cache problem but doesn't work for static files or files that are not processed like SVG.

Also, about SEO, if a search engine indexes a processed image like logo_86ebcdde02.jpg and this image changes, this old image's URI becomes 404. Instead, if the image is indexes with a URI like logo.jpg?123456, the old image URI is still 200.

References

https://simonhearne.com/2022/caching-header-best-practices/
https://web.dev/http-cache/
https://developer.mozilla.org/en-US/docs/Learn/Server-side/Apache_Configuration_htaccess#cache_expiration

Actions #1

Updated by Christian Toffolo almost 2 years ago

  • Category set to Frontend
Actions #2

Updated by Chris Müller almost 2 years ago

I think you mix up the two topics "SEO" and "Caching".

Case 1 (browser caching with scaled images):

For browser caching the hashed filename is enough (logo_86ebcdde02.jpg). If the file changes the hash also changes (as I understand) and a browser will pick up the new file. If the file with the old hash is not available anymore the browser receives a 404 - which is correct as the image changed and the old one is not available anymore. This should be no issue for the website, as after clearing the frontend cache the old images shouldn't referenced anymore.

Case 2 (browser caching with original image embedded into site):

If the image is embedded into the website with its original dimensions (and hence the original filename is used without a cache buster), I agree, this could be an issue for browser caching when using long-living cache headers, but not for SEO. For SEO this is ideal, as the content changed but not the file name.

Case 3 (SEO crawling/indexing):

For search engines a filename with a hash (logo_86ebcdde02.jpg and logo_abcdef12345.jpg) in the name is a different URL. But that's also true when you have the file with a query string like logo.jpg?1234 and logo.jpg?9876. For a search engine they are two files under different URLs (but with same content then) - I assume they recognise the duplicate content and deduplicate it in their index. This is getting worse when you change often the file. But I am not so much into image SEO and how duplicate content is in issue there - and as a canonical URL for the main URL is not available as I assume by default (it can be added though to the HTTP header, but this would be then another story).

But the same content is also true for different sizes of the same image. They only differ in the size, not in the content of the image (content is the way what is seen on the image).

Would it be a way to redirect (301/307) to the new "hashed" image files like HTML URLs when someone cares much about image SEO? This is a clean way without cluttering the search index with the same image over and over again. I assume this may be possible with a custom PSR-14 event which generates automatically the redirects (some of https://docs.typo3.org/m/typo3/reference-coreapi/11.5/en-us/Events/Events/Core/Resource/Index.html may be appropriate).

Actions #3

Updated by Benni Mack over 1 year ago

  • Sprint Focus set to On Location Sprint
Actions #4

Updated by Oliver Hader over 1 year ago

  • Sprint Focus deleted (On Location Sprint)
Actions

Also available in: Atom PDF