Bug #68134

sem_acquire while loop can lead to 100% CPU load

Added by Stefan Hekele over 4 years ago. Updated over 3 years ago.

Status:
Closed
Priority:
Should have
Assignee:
-
Category:
Performance
Target version:
-
Start date:
2015-07-14
Due date:
% Done:

0%

TYPO3 Version:
6.2
PHP Version:
5.5
Tags:
Complexity:
Is Regression:
No
Sprint Focus:

Description

Faulty function:
typo3/sysext/core/Classes/Locking/Locker.php->acquire

Environment:
RedHat 6.5
Apache 2.4.12
PHP FPM 5.5.6 / 5.5.21 (tested before and after update)
TYPO3 6.2.14 with semaphore locking

Encountered behavior:
After a cache clear via admin interface, a siege run with more than 3-4 users leads to at least one timed out response. The FPM process serving that requests uses 100% CPU till PHP timeout. Using the recommended timeout setting of 240, this quickly leads to an unresponsive server at just medium traffic. A lower timeout setting mitigates the problem.

Expected behavior:
Just like before 6.2, and like 7.3+, the lock acquiration should not be attempted in a loop. After a failed lock, an exception should be thrown, same as the other locking mechanisms.

Proposed change:
https://github.com/plan2net/TYPO3.CMS/commit/055b4f853f1fe44db023da3c2b60c7ef988ef817

History

#1 Updated by Georg Kühnberger over 4 years ago

I observed the same behaviour (CPU maxes out) in another installation.

The Locking method LOCKING_METHOD_SEMAPHORE simply is broken in 6.2.* and apparently also in the 6.2-compliancy code still living in 7.

Stefan's patch fixes the broken "while" perfect.

#2 Updated by Georg Kühnberger over 4 years ago

  • Assignee set to Georg Ringer

Ahoi GeorgR,

Can you please review Stefan's patch and bring it into the gerrit process?
Guess it would be a cool fix for a real stupid bug which is a Blocker, that is still lives in both, the 6.2 + future LTS versions.

Thanks G

#3 Updated by Markus Klein over 4 years ago

  • Status changed from New to Needs Feedback
  • Assignee changed from Georg Ringer to Markus Klein

Where are you using this?

Using acquireExclusiveLock() would be the right way.

#4 Updated by Stefan Hekele over 4 years ago

TYPO3\CMS\Frontend\Controller\TypoScriptFrontendController->acquirePageGenerationLock calls TYPO3\CMS\Core\Locking\Locker->acquire

#5 Updated by Markus Klein over 4 years ago

I dug into that again and can explain that now.

The reason for FE still using the old acquire() method is that it relies on the (crazy) implementation of that function.
It is not feasible to replace the implementation in 6.2 anymore. I invested a fortune of time in 7 to get that whole API out of the way and the FE implementation correct.
For 6.2 I highly recommend to NOT use semaphore locking at all.
Some other reason: The implementation is not safe an uses different identifiers for the semaphores which causes the system to be flooded with semaphores. You need a root on the server to actually get rid of them. Number of semaphores allocated might be O(number of pages)

#6 Updated by Oliver Gassner over 4 years ago

If it doesn't get fixed in 6.2 then you should at least disable the locking-mechanism semaphore in 6.2 with the next release if it's broken beyond repair.
It's an absurdity to speak of Long Term Support and then not fixing such inherently broken features.

#7 Updated by Georg Kühnberger over 4 years ago

I understand that:
- 6.2 LTS has a bug, which can bring down a webserver in seconds.
- The bug is well known, understood and can be fixed.
- The bug is also shipped with the 7.2 future LTS version (same code, same bug)

But "It is not feasible to replace the implementation in 6.2 anymore".
I fail to understand WHY.

Given Peter's Markus Klein's Warning:
"highly recommend to NOT use semaphore locking at all"
I suggest to:
- Change the LOCKING_METHOD_SEMAPHORE to use the filelocking Method (as is working)
- Remove the broken semaphore code.

Q: Shall we provide a patch implementing this strategy?

PS: Funny to read that the 6.2 Core uses a methode which is deprecated since 6.2, see:
TYPO3\CMS\Core\Locking\Locker->acquire
@deprecated since 6.2 - will be removed two versions later; use new API instead

Edit: "Peter" = MarkusK

#8 Updated by Markus Klein over 4 years ago

- The bug is also shipped with the 7.2 future LTS version (same code, same bug)

No, because the API has been changed and this behaviour is therefore fixed!

It is not feasible to replace the implementation in 6.2 anymore

Because that is a fucking dangerous area in the most central component of FE.
For details on how to fix this, see details in
https://review.typo3.org/#/c/37700/20/typo3/sysext/frontend/Classes/Controller/TypoScriptFrontendController.php
and
https://review.typo3.org/#/c/38840/13/typo3/sysext/frontend/Classes/Controller/TypoScriptFrontendController.php

The recommendation stems from me and not from "Peter".

PS: Funny to read that the 6.2 Core uses a methode which is depre....

See my explanation above, why this is the case!

A: If you get that fixed, I'm happy to review a patch. But I can of course only speak for myself.

#9 Updated by Georg Kühnberger over 4 years ago

Markus, (!=Peter).
- Sorry for the name mismatch. Is corrected.

Thanks for your answer to my question:

Q: Shall we provide a patch implementing this strategy?
A: If you get that fixed, I'm happy to review a patch. But I can of course only speak for myself.

We'll work on that one.

#10 Updated by Markus Klein over 4 years ago

  • Status changed from Needs Feedback to New
  • Assignee deleted (Markus Klein)

#11 Updated by Riccardo De Contardi about 4 years ago

  • Category set to Performance

#12 Updated by Markus Klein almost 4 years ago

  • Status changed from New to Needs Feedback

Still waiting for the patch by Georg.

#13 Updated by Markus Klein over 3 years ago

  • Status changed from Needs Feedback to Closed

Closed due to missing feedback.

Also available in: Atom PDF