Project

General

Profile

Actions

Bug #24389

closed

speed up typoLink function by caching domain records

Added by Simon Schaufelberger almost 14 years ago. Updated over 2 years ago.

Status:
Closed
Priority:
Should have
Category:
Caching
Target version:
Start date:
2010-12-22
Due date:
% Done:

100%

Estimated time:
TYPO3 Version:
6.1
PHP Version:
Tags:
Complexity:
Is Regression:
Sprint Focus:

Description

there is a big TODO already in the comments, so if TYPO3 4.5 wants to be the fastest TYPO3 ever, this part of the typoLink function should be cached since it is called MANY times on a page...

$res = $GLOBALS['TYPO3_DB']->exec_SELECTquery(
    'pid, domainName, forced',
    'sys_domain',
    'pid IN (' . implode(',', $targetPageRootlinePids) . ') ' . ' AND redirectTo=\'\' ' . $this->enableFields('sys_domain'),
    '',
    'sorting ASC'
);

// TODO maybe it makes sense to hold all sys_domain records in a cache to save additional DB querys on each typolink
while ($row = $GLOBALS['TYPO3_DB']->sql_fetch_assoc($res)) {
    $foundDomains[] = preg_replace('/\/$/', '', $row['domainName']);
    if (!isset($firstFoundDomains[$row['pid']])) {
        $firstFoundDomains[$row['pid']] = preg_replace('/\/$/', '', $row['domainName']);
    }
    if ($row['forced'] && !isset($firstFoundForcedDomains[$row['pid']])) {
        $firstFoundForcedDomains[$row['pid']] = preg_replace('/\/$/', '', $row['domainName']);
    }
}

fileref: typo3\sysext\cms\tslib\class.tslib_content.php

(issue imported from #M16812)


Files

typolink_test.diff (3.28 KB) typolink_test.diff Administrator Admin, 2011-01-13 15:40
16812_02.diff (7.47 KB) 16812_02.diff Administrator Admin, 2011-01-13 22:25
realurl-sysdomain-cache.diff (1013 Bytes) realurl-sysdomain-cache.diff Patch for RealURL Steffen Gebert, 2011-07-18 23:45
T3X_sysdomain_cache-1_0_0-z-201111212202.t3x (11.9 KB) T3X_sysdomain_cache-1_0_0-z-201111212202.t3x Steffen Gebert, 2012-02-10 00:43
T3X_cache_backend_transient-1_0_0-z-201111212204.t3x (4.48 KB) T3X_cache_backend_transient-1_0_0-z-201111212204.t3x Steffen Gebert, 2012-02-10 00:43

Related issues 3 (0 open3 closed)

Related to TYPO3 Core - Bug #17421: tslib_cObj::typoLink() is slow / patch to add cachingClosed2007-06-27

Actions
Related to TYPO3 Core - Feature #24888: Add a cache for typoLink()Closed2011-01-29

Actions
Related to TYPO3 Core - Bug #62556: problems with links and absRefPrefix when having multiple domains per siteClosed2014-10-30

Actions
Actions #1

Updated by Frederic Gaus almost 14 years ago

What would be the right place to implement the cache? Extending TSFE with an array holding the (maybe conditioned) result of the query?

Actions #2

Updated by Christian Kuhn almost 14 years ago

Just an idea, without digging deep into this issue:
Maybe we could use the caching framework with a TransientMemoryBackend for this?

Actions #3

Updated by Christian Kuhn almost 14 years ago

I played around a bit. Here are my findings:

First, this sys_domain call is only ever done, if TS config.typolinkCheckRootline = 1 is set, which is not done by default. I doubt many people set this. So, these sys_domain query are probably not done at all for 95% of the TYPO3 installations.

Furthermore the sys_domain table is typically very small (often no record at all, but for sure not more than 100 or even 1000 records in 99,9% of installations. Next, even if this query is fired pretty often to the database, the query will be fully cached and does not take more time than maybe some nanoseconds. So this will only accumulate in time if the database server is on some other host so TCP overhead and latency come into the game.

Summary:
If we really implement caching for this single query it will only have a remarkable benefit, if:
  • config.typolinkCheckRootline = 1 is set
  • There are tons of records is sys_domain table
  • the database server is located on some other host.

At the moment I don't feel this is worth the effort.

What could be done / evaluated:
- Check if it makes sense to work with prepared statements for the sys_domain queries.
- It might make sense to wrap the whole typoLink method with a cache mechanism. This could speed up rendering if the same links are generated within one request. An implementation with the transientMemoryBackend would be pretty easy, we even won't need to do tagging if we assume that links do not change within one request.

Actions #4

Updated by Frederic Gaus almost 14 years ago

Christian, you are right. Therefor, this bug is mainly a copy of #17421

Actions #5

Updated by Ernesto Baschny almost 14 years ago

Thanks for the help, but as Christian analysed, it is probably not worth it. But anyways speeding up typolink would be great, as this is usually the biggest bottlenet when thousands of links need to be generated by an USER_INT for example (not uncommon).

Actions #6

Updated by Steffen Gebert almost 14 years ago

I was just asked to have a look at a large TYPO3 installation (60k pages, 250 domains).

Generating a page, which has 60 links, issues 1400 SQL queries (solely network delay of ~4sec), about 150 of them are going to sys_domain during menu creation.

Will have a look and either create an extension or sth. for 4.6

Actions #7

Updated by Christian Kuhn almost 14 years ago

Steffen, would you like to bench how much time is consumed by the domain queries, and how much for the whole typoLink method?

Actions #8

Updated by Steffen Gebert almost 14 years ago

Problem is not the computing time, but the delay between the servers. Will revisit this topic in a few days.

I will try to read whole sys_domain once and cache it locally (which might not be a way to go for the core).

There are in fact 350 queries to sys_domain (out of the 1400 for rendering the start page), 270 of them are:
SELECT domainName FROM sys_domain WHERE pid=XXX AND redirectTo='' AND hidden=0 ORDER BY sorting;
the root page's domain record is retrieved 60 times (I assume once per link).

Actions #9

Updated by Christian Kuhn almost 14 years ago

@Steffen: I'll try to come up with a simple patch to cache those queries today evening.
So don't spend to much time on this issue today, we'll see how much this patch drains your parsetime if the network latency due to this repeated query is reduced.

Actions #10

Updated by Frederic Gaus almost 14 years ago

Maybe you guys can play around with this small patch. Should be compatible even if you dont use the caching framework. But it was only a short test :)

So for testing proposes / measurements it should be enough to just disable the cachingFramework

Actions #11

Updated by Christian Kuhn almost 14 years ago

@Frederic: Your patch is a good idea. The cache instantiation and is not complete, I'll finalize that. Additionally, I do not like to fetch all records, I think it would be better to just fetch needed records and cache that.

For the solution itself: I don't think we should actually solve it this way (with a cache for that purpose): It would probably be much better (from an architectual point of view) if we use a preparedStatement at that point and implement a cache at PDO layer, maybe using the transientMemory backend of the caching framework. Actually I'll bet that Xavier has something like that already in mind, it's just not implemented in 4.5 yet (and will not be done anymore for 4.5).

Here is my goal: I'll try to come up with a usable patch which could be applied locally but not officially to 4.5. It's for people who need it and to test the performance gain for installations with lots of records and remote mysql. For 4.6 we should enable the caching framework by default, make this query a preparedStatement (that's my plan anyway), and implement caching on PDO level.

Actions #12

Updated by Christian Kuhn almost 14 years ago

Patch attached.

The patch is straight forward (not core ready: no abstraction, code duplication, cache is not reusable for other sys_domain purposes) which adds a runtime cache and uses it. This patch will reduce latency for links to the same page on consecutive calls within one request, it does not fetch all records at once and stores it. I'm unsure if this is the way to go, but it could be pretty much the same of what can be reached with a caching patch on PDO level for this kind of query (I wanted to test explicitly this with this patch).

To test you must enable the caching framework: $TYPO3_CONF_VARS['SYS']['useCachingFramework'] = '1';

@Steffen, maybe you could run this patch and give some feedback if this direction is a good idea or if we instead should go a way similar to what is done with Frederics approach (get all records once, nail to an array and make lookups on them).

Actions #13

Updated by Steffen Gebert almost 14 years ago

Thanks a ton, Christian! Will let them test and post results.

I expect the RuntimeCache to be used in several places later on.
What, if I would now like to use a memcache backend for it, to persist the results (and take care of flushing it in case of sys_domain changes)?

I can't say that only this particular use (for sys_domain) is cached in memcache, whilst other usages (let's say for some nested FE group membership caching) not, can't I?

Actions #14

Updated by Christian Kuhn almost 14 years ago

@Steffen:
Speed<->scope of different backends:
- transientMemory: Only valid during one request, not many cache invalidation problems if data doesn't change during one request. Extremly quick (direct memory access). Advantage to class based static cache variables: Can be shared between classes :)
- apc / xcache backends: Shared between php processes on same machine, cache invalidation must be considered. Quicker than memcache / db, at least if they are remote. Can not always be used (apc and xcache require the php module). Ah, and xcache is not available yet, I'll likely hack something up for FLOW3 / 4.6 ;)
- memcache / redis: Quicker than db backend and scales better if db is on different host and cache is big. Cache invalidation must be considered.

So: The transienBackend as "runtime" cache is an option for data which is different between requests, but not within an request. For example if user added some domain record in our case (different requests) The backend could be used for certain preparedStatements, if there is no update query and a re-fetch during one request.

So, for a overall construct in the long run, I'd like to add a "runtime" cache for data which is requested multiple time during one request (eg. prepared statements, typolink requests, ..) to be shared quickly during the request, so that different parts can benefit from each other. (example: typolink creates a sys_domain requested -> cached, some other part requests same information -> get from cache). Signature: TransientMemory is very quick.

Furthermore, I'd like to establish a request shared cache for PHP files: Eg. the autoloader, the temp_CACHED* files and some other caches could benefit from this. I do have working code for this, but for a real core solution this would require a refactoring of TYPO3 boostrap and especially getting rid of $GLOBALS scope and using classes for this to get it right. This is more long term and needs much more re-thinking. I'll write some more about this topic for 4.6. Francois and some other people have already done work on this topic. It will become hard, that's the least thing I already know ;)

BTW: @see http://wiki.typo3.org/Caching_framework

Actions #15

Updated by Steffen Gebert almost 14 years ago

Thanks for your explanations, Christian! And of course, I'm also happy with your solution..

I know the RuntimeCache with TransientBackend is only for one request - but what if I want to make it a transient cache - change the Backend to e.g. memcache and invalidate caches by an own extension, when sys_domain is changed.
This would also affect other RuntimeCache usages (which also have to be cleaned up).

However.. after some thinking: If I want to stick to my "I read whole sys_domains, but only once", I can just read it (in an extension) and create the cache entries. This should make me happy, shouldn't it? :)

Actions #16

Updated by Christian Kuhn almost 14 years ago

@Steffen: Sure, it's possible to just switch to some other backend for the runtime cache (I'll probably use APC as it's still much quicker than memcache) if you do not care about cache invalidiaton (btw: enetcache takes care of invalidation with tcemain hooks if tagging is done right).
The patch will then have the effect that on first access most of your data will be cached anyway, and exactly no more computation must be done for additional accesses. So your advantage of reading eveything at once would practically reduce to zero.

Actions #17

Updated by Steffen Gebert almost 14 years ago

Of course, either read everything at once with Transient or just cache to some Persistent cache..

Actions #18

Updated by Christian Kuhn almost 14 years ago

Yeah, and invalide correctly if persisting. Together with a cache which is not remote (like memcache in production) to actually gain performance in comparison to remote mysql. So only option for persistence in this case is currently APC. This would actually work out pretty well in this case, I guess.

Ah, tagging might be pretty easy: Just add sys_domain_"uid" as tags to all entries and invalidate with a tce hook which drops every tag which is in rootline.

Actions #19

Updated by Klaus Hinum over 13 years ago

  • Target version deleted (0)

I just ran into a speed / memory issue with the typolink function. When creating a XML sitemap of our site, we always run into the memory limit of 256MB. Somehow, each call of the typolink function takes up more memory. When commenting out the domain record search makes it at least faster (and using up a bit less memory, as we use config.typolinkCheckRootline).

Actions #20

Updated by Steffen Gebert over 13 years ago

  • TYPO3 Version set to 4.6

I've just spent a few hours on this and also got in contact with the Administrator of the problematic site I mentioned.

However, Christians patch isn't as efficient as it could be (we didn't try it, but I analyzed it in-depth): It only caches the result per typoLink target page and this result is only used, when building a link to this page (again) or a subpage.
Instead, I would really go for caching the whole sys_domain once and then query the RuntimeCache on it. Currently I have something in between those two solutions, which still has some issues, but could be easily avoided by having fast access to the table contents.

Is anybody in for some sponsoring? Klaus? I can't promise to save huge amounts of memory, but several SQL queries would be avoidable.

Actions #21

Updated by Steffen Gebert over 13 years ago

I've just pushed a draft of this change to the branch sandbox/stephenking/sys-domain-caching

I would be happy to get feedback, as the guys, for which I made this improvements are under heavy load, have no time to check, blabla..

Basically, it ready all sys_domain records once and then uses them, instead of accessing the DB. To really save many queries, a patch for RealURL is attached. This reduces the number of queries to sys_domain from >100 to 5.

So, please check this patch with your site having lots of domains and links across domains. Would be nice, to not only get feedback from the number of SQL queries, but also on whole (uncached) page generation time of real-world sites.

Thanks for your support
Steffen

Actions #22

Updated by Steffen Gebert over 13 years ago

  • Status changed from New to Needs Feedback
Actions #23

Updated by Simon Schaufelberger almost 13 years ago

any progress here?

Actions #24

Updated by Steffen Gebert almost 13 years ago

Yes, there is. I've implemented and tested this, currently dkd is evaluating it in a project since last week. So stay tuned, it's not forgotten.

Actions #25

Updated by Simon Schaufelberger almost 13 years ago

push!

Actions #26

Updated by Simon Schaufelberger almost 13 years ago

push!
TYPO3 4.7 Feature freeze is coming close and i want to have this included PLEASE!!!

Actions #27

Updated by Steffen Gebert almost 13 years ago

Unfortunately I didn't get any feedback. Would be nice, if you could give the patch (for core and for realurl) a try!

Actions #28

Updated by Simon Schaufelberger almost 13 years ago

I will!

Actions #29

Updated by Philipp Bergsmann almost 13 years ago

I just patched a 4.5.11 instance with the patches above (core and realurl) and it didn't really work:

I used the TransientMemoryBackend because I thought it would be the most fitting.

I'm using the "_DOMAINS" configuration of the realurl for language switching with domains. After patching the instance (with or without patching the realurl) on the german page (default language) all links now target to the english domain and the realurl-encoding doesn't work anymore. After reverting the patch everything works as expected again. Do I have to use a 4.6 or 4.7 TYPO3-core?

But the performance increased noticeable :)

I'll try to figure out what went wrong but there are a lot of possibilities I guess...

Updated by Steffen Gebert almost 13 years ago

Hi Philipp!

You could try the attached extensions. One brings the TransientMemoryBackend to 4.5 and the other one XCLASSes for typolink and realurl.

Steffen

Actions #31

Updated by Simon Schaufelberger almost 13 years ago

Hi together,
i am quite confused now which version to test since there are 2 diff files attached to this issue and then those 2 t3x files from Steffen.

i tested the sysdomain_cache from the t3x file from Steffen and I merged the diff into the current trunk (typo3 4.7) and it works perfectly.

i could not get the cache_backend_transient extension running. It throws an exception when using with it with typo3 4.7.

now when i want to make a gerrit patch request, which version shall i push? am i right that we cannot merge the sysdomain and the transient cache at the same time?

PS: i really want to have this in 4.7!!!

Actions #32

Updated by Steffen Gebert almost 13 years ago

Ah, sorry.. these extensions were meant as backport to 4.5.

Please use the sys-domain-caching git branch. You can cherry-pick ef5555e486e0c141ca9d0d2f71d365bfce7182f3, which hopefully still applies.

Other than that you should apply the patch for realurl, which brings another big performance gain.

Actions #33

Updated by Simon Schaufelberger almost 13 years ago

ok, i tested exactly this code in 4.7 since it is just removing the domein lookup logic into fe class and it worked very well.
i will test the realurl patch tonight.

Actions #34

Updated by Simon Schaufelberger almost 13 years ago

OK, one more day to go, right? Steffen, can you provide a patch and push that to gerrit so that it can be merged before tomorrow? As far as i have understood features that are not merged until beta 1 have to wait for next release which would be waiting another half year...

Actions #35

Updated by Steffen Gebert almost 13 years ago

How should it be merged, if we don't know anything about the real performance impact? The patch is there.. feel free to push, but I expect it to be fired off immediately, without any measurements in real environments.

Actions #36

Updated by Simon Schaufelberger almost 13 years ago

we from punkt.de have some customer websites with big menus with far more than 40 links just in the main navigation. at the moment for EVERY of these links gets the domain record table hit... and this is just the main menu not included any submenus on the side and any links in the page content.

Actions #37

Updated by Steffen Gebert almost 13 years ago

So test this patch and report your findings!

Actions #38

Updated by Gerrit Code Review almost 13 years ago

  • Status changed from Needs Feedback to Under Review

Patch set 1 for branch master has been pushed to the review server.
It is available at http://review.typo3.org/9023

Actions #39

Updated by Gerrit Code Review almost 13 years ago

Patch set 2 for branch master has been pushed to the review server.
It is available at http://review.typo3.org/9023

Actions #40

Updated by Gerrit Code Review almost 13 years ago

Patch set 3 for branch master has been pushed to the review server.
It is available at http://review.typo3.org/9023

Actions #41

Updated by Gerrit Code Review over 12 years ago

Patch set 4 for branch master has been pushed to the review server.
It is available at http://review.typo3.org/9023

Actions #42

Updated by Gerrit Code Review over 12 years ago

Patch set 5 for branch master has been pushed to the review server.
It is available at http://review.typo3.org/9023

Actions #43

Updated by Simon Schaufelberger over 12 years ago

Beta is comming!!! Will this still be part of it???

Actions #44

Updated by Gerrit Code Review almost 12 years ago

Patch set 6 for branch master has been pushed to the review server.
It is available at http://review.typo3.org/9023

Actions #45

Updated by Gerrit Code Review almost 12 years ago

Patch set 7 for branch master has been pushed to the review server.
It is available at http://review.typo3.org/9023

Actions #46

Updated by Gerrit Code Review almost 12 years ago

Patch set 8 for branch master has been pushed to the review server.
It is available at http://review.typo3.org/9023

Actions #47

Updated by Gerrit Code Review almost 12 years ago

Patch set 9 for branch master has been pushed to the review server.
It is available at https://review.typo3.org/9023

Actions #48

Updated by Gerrit Code Review almost 12 years ago

Patch set 10 for branch master has been pushed to the review server.
It is available at https://review.typo3.org/9023

Actions #49

Updated by Gerrit Code Review almost 12 years ago

Patch set 11 for branch master has been pushed to the review server.
It is available at https://review.typo3.org/9023

Actions #50

Updated by Gerrit Code Review almost 12 years ago

Patch set 12 for branch master has been pushed to the review server.
It is available at https://review.typo3.org/9023

Actions #51

Updated by Gerrit Code Review almost 12 years ago

Patch set 13 for branch master has been pushed to the review server.
It is available at https://review.typo3.org/9023

Actions #52

Updated by Steffen Gebert almost 12 years ago

  • Status changed from Under Review to Resolved
  • % Done changed from 0 to 100
Actions #53

Updated by Steffen Gebert over 11 years ago

  • Target version set to 6.1.0
  • TYPO3 Version changed from 4.6 to 6.1
Actions #54

Updated by Gerrit Code Review over 11 years ago

  • Status changed from Resolved to Under Review

Patch set 1 for branch master_new has been pushed to the review server.
It is available at https://review.typo3.org/18623

Actions #55

Updated by Christian Kuhn over 11 years ago

  • Status changed from Under Review to Resolved
Actions #56

Updated by Oliver Hader over 11 years ago

  • Target version changed from 6.1.0 to 6.1.0-alpha1
Actions #57

Updated by Riccardo De Contardi about 7 years ago

  • Status changed from Resolved to Closed
Actions #58

Updated by Simon Schaufelberger over 2 years ago

  • Description updated (diff)
Actions

Also available in: Atom PDF