Bug #24389

speed up typoLink function by caching domain records

Added by Simon Schaufelberger almost 9 years ago. Updated about 2 years ago.

Status:
Closed
Priority:
Should have
Category:
Caching
Target version:
Start date:
2010-12-22
Due date:
% Done:

100%

TYPO3 Version:
6.1
PHP Version:
Tags:
Complexity:
Is Regression:
Sprint Focus:

Description

there is a big TODO already in the comments, so if TYPO3 4.5 wants to be the fastest TYPO3 ever, this part of the typoLink function should be cached since it is called MANY times on a page...

$res = $GLOBALS['TYPO3_DB']->exec_SELECTquery(
'pid, domainName, forced',
'sys_domain',
'pid IN (' . implode(',', $targetPageRootlinePids) . ') ' . ' AND redirectTo=\'\' ' . $this->enableFields('sys_domain'),
'',
'sorting ASC'
);

// TODO maybe it makes sense to hold all sys_domain records in a cache to save additional DB querys on each typolink
while ($row = $GLOBALS['TYPO3_DB']->sql_fetch_assoc($res)) {
$foundDomains[] = preg_replace('/\/$/', '', $row['domainName']);
if (!isset($firstFoundDomains[$row['pid']])) {
$firstFoundDomains[$row['pid']] = preg_replace('/\/$/', '', $row['domainName']);
}
if ($row['forced'] && !isset($firstFoundForcedDomains[$row['pid']])) {
$firstFoundForcedDomains[$row['pid']] = preg_replace('/\/$/', '', $row['domainName']);
}
}

fileref: typo3\sysext\cms\tslib\class.tslib_content.php

(issue imported from #M16812)

typolink_test.diff View (3.28 KB) Administrator Admin, 2011-01-13 15:40

16812_02.diff View (7.47 KB) Administrator Admin, 2011-01-13 22:25

realurl-sysdomain-cache.diff View - Patch for RealURL (1013 Bytes) Steffen Gebert, 2011-07-18 23:45

T3X_sysdomain_cache-1_0_0-z-201111212202.t3x (11.9 KB) Steffen Gebert, 2012-02-10 00:43

T3X_cache_backend_transient-1_0_0-z-201111212204.t3x (4.48 KB) Steffen Gebert, 2012-02-10 00:43


Related issues

Related to TYPO3 Core - Bug #17421: tslib_cObj::typoLink() is slow / patch to add caching Closed 2007-06-27
Related to TYPO3 Core - Feature #24888: Add a cache for typoLink() Closed 2011-01-29
Related to TYPO3 Core - Bug #62556: problems with links and absRefPrefix when having multiple domains per site Closed 2014-10-30

Associated revisions

Revision 451669a0 (diff)
Added by Steffen Gebert almost 7 years ago

[FEATURE] Speed up typoLink function by caching domain records

Cache the domain records in a runtime cache for improved rendering
of links generated with typolink function.

This change takes only effect, when
config.typolinkCheckRootline = 1
is set, which otherwise costs immense performance in terms of huge
number of SQL queries.

Resolves: #24389
Releases: 6.1
Change-Id: I7c6bee1cd6ee1cb0901d926dd6ce9a22c00501ab
Reviewed-on: https://review.typo3.org/9023
Reviewed-by: Dmitry Dulepov
Tested-by: Dmitry Dulepov
Reviewed-by: Simon Schaufelberger
Tested-by: Simon Schaufelberger

Revision e91556cf (diff)
Added by Wouter Wolters almost 7 years ago

[BUGFIX] Follow-up clean up to #24389

Change-Id: I2b5438edd23650fd63b1771c9d6a825ae1aaa210
Related: #24389
Releases: 6.1
Reviewed-on: https://review.typo3.org/18100
Reviewed-by: Anja Leichsenring
Tested-by: Anja Leichsenring
Reviewed-by: Christian Kuhn
Tested-by: Christian Kuhn

History

#1 Updated by Frederic Gaus almost 9 years ago

What would be the right place to implement the cache? Extending TSFE with an array holding the (maybe conditioned) result of the query?

#2 Updated by Christian Kuhn almost 9 years ago

Just an idea, without digging deep into this issue:
Maybe we could use the caching framework with a TransientMemoryBackend for this?

#3 Updated by Christian Kuhn almost 9 years ago

I played around a bit. Here are my findings:

First, this sys_domain call is only ever done, if TS config.typolinkCheckRootline = 1 is set, which is not done by default. I doubt many people set this. So, these sys_domain query are probably not done at all for 95% of the TYPO3 installations.

Furthermore the sys_domain table is typically very small (often no record at all, but for sure not more than 100 or even 1000 records in 99,9% of installations. Next, even if this query is fired pretty often to the database, the query will be fully cached and does not take more time than maybe some nanoseconds. So this will only accumulate in time if the database server is on some other host so TCP overhead and latency come into the game.

Summary:
If we really implement caching for this single query it will only have a remarkable benefit, if:
  • config.typolinkCheckRootline = 1 is set
  • There are tons of records is sys_domain table
  • the database server is located on some other host.

At the moment I don't feel this is worth the effort.

What could be done / evaluated:
- Check if it makes sense to work with prepared statements for the sys_domain queries.
- It might make sense to wrap the whole typoLink method with a cache mechanism. This could speed up rendering if the same links are generated within one request. An implementation with the transientMemoryBackend would be pretty easy, we even won't need to do tagging if we assume that links do not change within one request.

#4 Updated by Frederic Gaus almost 9 years ago

Christian, you are right. Therefor, this bug is mainly a copy of #17421

#5 Updated by Ernesto Baschny almost 9 years ago

Thanks for the help, but as Christian analysed, it is probably not worth it. But anyways speeding up typolink would be great, as this is usually the biggest bottlenet when thousands of links need to be generated by an USER_INT for example (not uncommon).

#6 Updated by Steffen Gebert almost 9 years ago

I was just asked to have a look at a large TYPO3 installation (60k pages, 250 domains).

Generating a page, which has 60 links, issues 1400 SQL queries (solely network delay of ~4sec), about 150 of them are going to sys_domain during menu creation.

Will have a look and either create an extension or sth. for 4.6

#7 Updated by Christian Kuhn almost 9 years ago

Steffen, would you like to bench how much time is consumed by the domain queries, and how much for the whole typoLink method?

#8 Updated by Steffen Gebert almost 9 years ago

Problem is not the computing time, but the delay between the servers. Will revisit this topic in a few days.

I will try to read whole sys_domain once and cache it locally (which might not be a way to go for the core).

There are in fact 350 queries to sys_domain (out of the 1400 for rendering the start page), 270 of them are:
SELECT domainName FROM sys_domain WHERE pid=XXX AND redirectTo='' AND hidden=0 ORDER BY sorting;
the root page's domain record is retrieved 60 times (I assume once per link).

#9 Updated by Christian Kuhn almost 9 years ago

@Steffen: I'll try to come up with a simple patch to cache those queries today evening.
So don't spend to much time on this issue today, we'll see how much this patch drains your parsetime if the network latency due to this repeated query is reduced.

#10 Updated by Frederic Gaus almost 9 years ago

Maybe you guys can play around with this small patch. Should be compatible even if you dont use the caching framework. But it was only a short test :)

So for testing proposes / measurements it should be enough to just disable the cachingFramework

#11 Updated by Christian Kuhn almost 9 years ago

@Frederic: Your patch is a good idea. The cache instantiation and is not complete, I'll finalize that. Additionally, I do not like to fetch all records, I think it would be better to just fetch needed records and cache that.

For the solution itself: I don't think we should actually solve it this way (with a cache for that purpose): It would probably be much better (from an architectual point of view) if we use a preparedStatement at that point and implement a cache at PDO layer, maybe using the transientMemory backend of the caching framework. Actually I'll bet that Xavier has something like that already in mind, it's just not implemented in 4.5 yet (and will not be done anymore for 4.5).

Here is my goal: I'll try to come up with a usable patch which could be applied locally but not officially to 4.5. It's for people who need it and to test the performance gain for installations with lots of records and remote mysql. For 4.6 we should enable the caching framework by default, make this query a preparedStatement (that's my plan anyway), and implement caching on PDO level.

#12 Updated by Christian Kuhn almost 9 years ago

Patch attached.

The patch is straight forward (not core ready: no abstraction, code duplication, cache is not reusable for other sys_domain purposes) which adds a runtime cache and uses it. This patch will reduce latency for links to the same page on consecutive calls within one request, it does not fetch all records at once and stores it. I'm unsure if this is the way to go, but it could be pretty much the same of what can be reached with a caching patch on PDO level for this kind of query (I wanted to test explicitly this with this patch).

To test you must enable the caching framework: $TYPO3_CONF_VARS['SYS']['useCachingFramework'] = '1';

@Steffen, maybe you could run this patch and give some feedback if this direction is a good idea or if we instead should go a way similar to what is done with Frederics approach (get all records once, nail to an array and make lookups on them).

#13 Updated by Steffen Gebert almost 9 years ago

Thanks a ton, Christian! Will let them test and post results.

I expect the RuntimeCache to be used in several places later on.
What, if I would now like to use a memcache backend for it, to persist the results (and take care of flushing it in case of sys_domain changes)?

I can't say that only this particular use (for sys_domain) is cached in memcache, whilst other usages (let's say for some nested FE group membership caching) not, can't I?

#14 Updated by Christian Kuhn almost 9 years ago

@Steffen:
Speed<->scope of different backends:
- transientMemory: Only valid during one request, not many cache invalidation problems if data doesn't change during one request. Extremly quick (direct memory access). Advantage to class based static cache variables: Can be shared between classes :)
- apc / xcache backends: Shared between php processes on same machine, cache invalidation must be considered. Quicker than memcache / db, at least if they are remote. Can not always be used (apc and xcache require the php module). Ah, and xcache is not available yet, I'll likely hack something up for FLOW3 / 4.6 ;)
- memcache / redis: Quicker than db backend and scales better if db is on different host and cache is big. Cache invalidation must be considered.

So: The transienBackend as "runtime" cache is an option for data which is different between requests, but not within an request. For example if user added some domain record in our case (different requests) The backend could be used for certain preparedStatements, if there is no update query and a re-fetch during one request.

So, for a overall construct in the long run, I'd like to add a "runtime" cache for data which is requested multiple time during one request (eg. prepared statements, typolink requests, ..) to be shared quickly during the request, so that different parts can benefit from each other. (example: typolink creates a sys_domain requested -> cached, some other part requests same information -> get from cache). Signature: TransientMemory is very quick.

Furthermore, I'd like to establish a request shared cache for PHP files: Eg. the autoloader, the temp_CACHED* files and some other caches could benefit from this. I do have working code for this, but for a real core solution this would require a refactoring of TYPO3 boostrap and especially getting rid of $GLOBALS scope and using classes for this to get it right. This is more long term and needs much more re-thinking. I'll write some more about this topic for 4.6. Francois and some other people have already done work on this topic. It will become hard, that's the least thing I already know ;)

BTW: @see http://wiki.typo3.org/Caching_framework

#15 Updated by Steffen Gebert almost 9 years ago

Thanks for your explanations, Christian! And of course, I'm also happy with your solution..

I know the RuntimeCache with TransientBackend is only for one request - but what if I want to make it a transient cache - change the Backend to e.g. memcache and invalidate caches by an own extension, when sys_domain is changed.
This would also affect other RuntimeCache usages (which also have to be cleaned up).

However.. after some thinking: If I want to stick to my "I read whole sys_domains, but only once", I can just read it (in an extension) and create the cache entries. This should make me happy, shouldn't it? :)

#16 Updated by Christian Kuhn almost 9 years ago

@Steffen: Sure, it's possible to just switch to some other backend for the runtime cache (I'll probably use APC as it's still much quicker than memcache) if you do not care about cache invalidiaton (btw: enetcache takes care of invalidation with tcemain hooks if tagging is done right).
The patch will then have the effect that on first access most of your data will be cached anyway, and exactly no more computation must be done for additional accesses. So your advantage of reading eveything at once would practically reduce to zero.

#17 Updated by Steffen Gebert almost 9 years ago

Of course, either read everything at once with Transient or just cache to some Persistent cache..

#18 Updated by Christian Kuhn almost 9 years ago

Yeah, and invalide correctly if persisting. Together with a cache which is not remote (like memcache in production) to actually gain performance in comparison to remote mysql. So only option for persistence in this case is currently APC. This would actually work out pretty well in this case, I guess.

Ah, tagging might be pretty easy: Just add sys_domain_"uid" as tags to all entries and invalidate with a tce hook which drops every tag which is in rootline.

#19 Updated by Klaus Hinum over 8 years ago

  • Target version deleted (0)

I just ran into a speed / memory issue with the typolink function. When creating a XML sitemap of our site, we always run into the memory limit of 256MB. Somehow, each call of the typolink function takes up more memory. When commenting out the domain record search makes it at least faster (and using up a bit less memory, as we use config.typolinkCheckRootline).

#20 Updated by Steffen Gebert over 8 years ago

  • TYPO3 Version set to 4.6

I've just spent a few hours on this and also got in contact with the Administrator of the problematic site I mentioned.

However, Christians patch isn't as efficient as it could be (we didn't try it, but I analyzed it in-depth): It only caches the result per typoLink target page and this result is only used, when building a link to this page (again) or a subpage.
Instead, I would really go for caching the whole sys_domain once and then query the RuntimeCache on it. Currently I have something in between those two solutions, which still has some issues, but could be easily avoided by having fast access to the table contents.

Is anybody in for some sponsoring? Klaus? I can't promise to save huge amounts of memory, but several SQL queries would be avoidable.

#21 Updated by Steffen Gebert over 8 years ago

I've just pushed a draft of this change to the branch sandbox/stephenking/sys-domain-caching

I would be happy to get feedback, as the guys, for which I made this improvements are under heavy load, have no time to check, blabla..

Basically, it ready all sys_domain records once and then uses them, instead of accessing the DB. To really save many queries, a patch for RealURL is attached. This reduces the number of queries to sys_domain from >100 to 5.

So, please check this patch with your site having lots of domains and links across domains. Would be nice, to not only get feedback from the number of SQL queries, but also on whole (uncached) page generation time of real-world sites.

Thanks for your support
Steffen

#22 Updated by Steffen Gebert over 8 years ago

  • Status changed from New to Needs Feedback

#23 Updated by Simon Schaufelberger about 8 years ago

any progress here?

#24 Updated by Steffen Gebert about 8 years ago

Yes, there is. I've implemented and tested this, currently dkd is evaluating it in a project since last week. So stay tuned, it's not forgotten.

#25 Updated by Simon Schaufelberger almost 8 years ago

push!

#26 Updated by Simon Schaufelberger almost 8 years ago

push!
TYPO3 4.7 Feature freeze is coming close and i want to have this included PLEASE!!!

#27 Updated by Steffen Gebert almost 8 years ago

Unfortunately I didn't get any feedback. Would be nice, if you could give the patch (for core and for realurl) a try!

#28 Updated by Simon Schaufelberger almost 8 years ago

I will!

#29 Updated by Philipp Bergsmann almost 8 years ago

I just patched a 4.5.11 instance with the patches above (core and realurl) and it didn't really work:

I used the TransientMemoryBackend because I thought it would be the most fitting.

I'm using the "_DOMAINS" configuration of the realurl for language switching with domains. After patching the instance (with or without patching the realurl) on the german page (default language) all links now target to the english domain and the realurl-encoding doesn't work anymore. After reverting the patch everything works as expected again. Do I have to use a 4.6 or 4.7 TYPO3-core?

But the performance increased noticeable :)

I'll try to figure out what went wrong but there are a lot of possibilities I guess...

#30 Updated by Steffen Gebert almost 8 years ago

Hi Philipp!

You could try the attached extensions. One brings the TransientMemoryBackend to 4.5 and the other one XCLASSes for typolink and realurl.

Steffen

#31 Updated by Simon Schaufelberger almost 8 years ago

Hi together,
i am quite confused now which version to test since there are 2 diff files attached to this issue and then those 2 t3x files from Steffen.

i tested the sysdomain_cache from the t3x file from Steffen and I merged the diff into the current trunk (typo3 4.7) and it works perfectly.

i could not get the cache_backend_transient extension running. It throws an exception when using with it with typo3 4.7.

now when i want to make a gerrit patch request, which version shall i push? am i right that we cannot merge the sysdomain and the transient cache at the same time?

PS: i really want to have this in 4.7!!!

#32 Updated by Steffen Gebert almost 8 years ago

Ah, sorry.. these extensions were meant as backport to 4.5.

Please use the sys-domain-caching git branch. You can cherry-pick ef5555e486e0c141ca9d0d2f71d365bfce7182f3, which hopefully still applies.

Other than that you should apply the patch for realurl, which brings another big performance gain.

#33 Updated by Simon Schaufelberger almost 8 years ago

ok, i tested exactly this code in 4.7 since it is just removing the domein lookup logic into fe class and it worked very well.
i will test the realurl patch tonight.

#34 Updated by Simon Schaufelberger almost 8 years ago

OK, one more day to go, right? Steffen, can you provide a patch and push that to gerrit so that it can be merged before tomorrow? As far as i have understood features that are not merged until beta 1 have to wait for next release which would be waiting another half year...

#35 Updated by Steffen Gebert almost 8 years ago

How should it be merged, if we don't know anything about the real performance impact? The patch is there.. feel free to push, but I expect it to be fired off immediately, without any measurements in real environments.

#36 Updated by Simon Schaufelberger almost 8 years ago

we from punkt.de have some customer websites with big menus with far more than 40 links just in the main navigation. at the moment for EVERY of these links gets the domain record table hit... and this is just the main menu not included any submenus on the side and any links in the page content.

#37 Updated by Steffen Gebert almost 8 years ago

So test this patch and report your findings!

#38 Updated by Gerrit Code Review almost 8 years ago

  • Status changed from Needs Feedback to Under Review

Patch set 1 for branch master has been pushed to the review server.
It is available at http://review.typo3.org/9023

#39 Updated by Gerrit Code Review almost 8 years ago

Patch set 2 for branch master has been pushed to the review server.
It is available at http://review.typo3.org/9023

#40 Updated by Gerrit Code Review almost 8 years ago

Patch set 3 for branch master has been pushed to the review server.
It is available at http://review.typo3.org/9023

#41 Updated by Gerrit Code Review almost 8 years ago

Patch set 4 for branch master has been pushed to the review server.
It is available at http://review.typo3.org/9023

#42 Updated by Gerrit Code Review almost 8 years ago

Patch set 5 for branch master has been pushed to the review server.
It is available at http://review.typo3.org/9023

#43 Updated by Simon Schaufelberger almost 8 years ago

Beta is comming!!! Will this still be part of it???

#44 Updated by Gerrit Code Review about 7 years ago

Patch set 6 for branch master has been pushed to the review server.
It is available at http://review.typo3.org/9023

#45 Updated by Gerrit Code Review about 7 years ago

Patch set 7 for branch master has been pushed to the review server.
It is available at http://review.typo3.org/9023

#46 Updated by Gerrit Code Review about 7 years ago

Patch set 8 for branch master has been pushed to the review server.
It is available at http://review.typo3.org/9023

#47 Updated by Gerrit Code Review almost 7 years ago

Patch set 9 for branch master has been pushed to the review server.
It is available at https://review.typo3.org/9023

#48 Updated by Gerrit Code Review almost 7 years ago

Patch set 10 for branch master has been pushed to the review server.
It is available at https://review.typo3.org/9023

#49 Updated by Gerrit Code Review almost 7 years ago

Patch set 11 for branch master has been pushed to the review server.
It is available at https://review.typo3.org/9023

#50 Updated by Gerrit Code Review almost 7 years ago

Patch set 12 for branch master has been pushed to the review server.
It is available at https://review.typo3.org/9023

#51 Updated by Gerrit Code Review almost 7 years ago

Patch set 13 for branch master has been pushed to the review server.
It is available at https://review.typo3.org/9023

#52 Updated by Steffen Gebert almost 7 years ago

  • Status changed from Under Review to Resolved
  • % Done changed from 0 to 100

#53 Updated by Steffen Gebert almost 7 years ago

  • Target version set to 6.1.0
  • TYPO3 Version changed from 4.6 to 6.1

#54 Updated by Gerrit Code Review almost 7 years ago

  • Status changed from Resolved to Under Review

Patch set 1 for branch master_new has been pushed to the review server.
It is available at https://review.typo3.org/18623

#55 Updated by Christian Kuhn almost 7 years ago

  • Status changed from Under Review to Resolved

#56 Updated by Oliver Hader over 6 years ago

  • Target version changed from 6.1.0 to 6.1.0-alpha1

#57 Updated by Riccardo De Contardi about 2 years ago

  • Status changed from Resolved to Closed

Also available in: Atom PDF