Cache duration wrongly calculated with associated records
In TYPO3 4.6, property config.cache was added to enable other records to influence the duration of a page's cache. The calculation happens in tslib_fe::getFirstTimeValueForRecord() but is broken.
The above-mentioned method searches for a minimum starttime and endtime (if both are defined for the given table) among records from a given table in a given pid. One condition exists to select records which have timestamps larger than the current timestamp, but it uses a OR on starttime and endtime. Example query:
SELECT MIN(starttime) AS starttime, MIN(endtime) AS endtime FROM tt_news WHERE pid=2070 AND (starttime>1333539180 OR endtime>1333539180) AND tt_news.deleted=0 AND tt_news.t3ver_state<=0 AND tt_news.pid<>-1 AND tt_news.hidden=0 AND (tt_news.fe_group='' OR tt_news.fe_group IS NULL OR tt_news.fe_group='0' OR FIND_IN_SET('0',tt_news.fe_group) OR FIND_IN_SET('-1',tt_news.fe_group))
This works fine as long as you have only records with a starttime, or only records with an endtime, or only records with both. But if you have records with a starttime but not endtime (or vice-versa), values of 0 will crop up in the results and the minimum will thus be 0.
A minimum of 0 actually kills this feature, because TYPO3 reverts to the page's default cache duration.
The solution is seek the minimum separately for each time field.
Updated by Francois Suter over 11 years ago
- select any page on your web site and make it depend on some other table, for example tt_news by setting the following configuration
config.cache.xx = tt_news:yy
where "xx" is the uid of the page whose cache duration must be influenced and "yy" is the pid where the tt_news items are stored.
- make sure to have at least tt_news item that has a starttime but not endtime and another item which has a endtime but no starttime
- clear the cache and view your page (while logged out of the backend)
- check the cache duration in table cf_cache_pages. It should be expiring in 24h (if that's your default duration)
- apply the patch, clear the cache again and view the page
- check the cache table again: the duration should correspond to the smallest start or end time in your tt_news set of items
Updated by Francois Suter over 11 years ago
Moving Jigal's comment from the Core mailing list and answering:
Let's discuss the approach for finding the cache lifetime here instead
of in gerrit reviews.
Meaning of values:
- starttime = 0 : no starttime or
infinitestarttime > 0 : timestamp
- endtime = 0 : no endtime or +infinite
- endtime > 0 : timestamp
For determining the lifetime we have to find CEs with:
endtime in the future OR no endtime
endtime > now OR endtime = 0
SELECT * FROM $table WHERE endtime > $now OR endtime = 0
(plus the enableFields we already have)
For all these records we
- ignore the endtime if it is equal to zero (+infinite)
- take the minimum of all the start times which are in the future and
all end times
I agree so far.
CASE WHEN base.starttime <= $now THEN NULL
SELECT starttime, endtime
WHERE endtime > $now OR endtime = 0
) AS base
(both NULLIF and CASE are ANSI SQL)
(for the subquery the enableFields which were already calculated in the
current code must be added of course)
I haven't checked the performance of this query, but this can be a start
to find the correct values.
Well, this is the sticking point, of course. The query works fine, but I really wonder if it makes us gain anything. I know my patch introduces a second query where there was only one before, but I don't think this can be avoided. Indeed your proposal also uses 2 queries, one being a sub-select. Furthermore your proposal makes use of several functions, whereas the current code (event with my patch) just calls on MIN. Instinctively I would tend to think that your proposal is less efficient.
I tried to test this locally but - with only a few hundred tt_news records at hand - all these queries amount to peanuts in processing time. So we could say that having a single query is more efficient if it takes about the same time as the current query, which would be doubled. However I wonder how functions like NULLIF and CASE scale with an increasing number of records. And are they supported by DBAL?
Updated by Jigal van Hemert over 11 years ago
SELECT SQL_NO_CACHE MIN(NULLIF(base.endtime,0)), MIN( CASE WHEN base.starttime <= UNIX_TIMESTAMP() THEN NULL ELSE base.starttime END ) FROM ( SELECT starttime, endtime FROM tt_content WHERE (endtime > UNIX_TIMESTAMP() OR endtime = 0) AND deleted=0 AND hidden=0 ) AS base
(cache disabled, deleted and hidden added for enableFields)
On four different databases (and different servers) with tt_content tables ranging from 7,000 - 20,000 records the queries took between 0.01 and 0.06 seconds to run.
Reading and testing François' patch shows that this is a correct solution. Taking into consideration that the query only has to look at the records in a single pid I don't think we'll notice the difference between one or two queries. The complexity of the single query and the chance that it might not be DBAL compatible causes me to favour François' solution.