Bug #66503
closedCore: Error handler (FE): PHP Warning: sem_get(): failed for key 0xbaa3533: No space left on device
100%
Description
Hi,
I got the following error (this information is from tab Log):
Core: Error handler (FE): PHP Warning: sem_get(): failed for key 0xbaa3533: No space left on device in /var/www/project/typo3_src/typo3/sysext/core/Classes/Locking/SemaphoreLockStrategy.php line 100
The problem is on page with single news. I also use fluidpages, fluidcontents, tq_seo. I added information about environment as print screen.
Files
Updated by Markus Klein over 9 years ago
Please check your system status with
ipcs -s sysctl -a |grep kernel\.sem
Updated by Markus Klein over 9 years ago
The default on Ubuntu 14.04 is:
kernel.sem = 32000 1024000000 500 32000
Updated by Mateusz Wojtuła over 9 years ago
I get this:
root@matw:~# ipcs -s ------ Semaphore Arrays -------- key semid owner perms nsems 0x0baa3533 0 www-data 666 3 0x7c9e0990 32769 www-data 666 3 0xa8e44899 65538 www-data 666 3 0x1e536c5e 98307 www-data 666 3 0xb4375310 131076 www-data 666 3 0xea7be8dc 163845 www-data 666 3 0x456c2c52 196614 www-data 666 3 0x02867477 229383 www-data 666 3 0xf0e544b5 262152 www-data 666 3 0x339a6321 294921 www-data 666 3 0x3ca1cf4d 327690 www-data 666 3 0x4d9d2850 360459 www-data 666 3 0xb532d7b8 393228 www-data 666 3 0x5d85f034 425997 www-data 666 3 0xd2358e97 458766 www-data 666 3 0xaf36479e 491535 www-data 666 3 0xad937213 524304 www-data 666 3 0x2347b1c0 557073 www-data 666 3 0x14362d4d 589842 www-data 666 3 0xb1063a0a 622611 www-data 666 3 0xa0784163 655380 www-data 666 3 0xa1cc990c 688149 www-data 666 3 0x87e611ee 720918 www-data 666 3 0xc2b3fab1 753687 www-data 666 3 0x33e81829 786456 www-data 666 3 0xd8c99d51 819225 www-data 666 3 0x8783b42a 851994 www-data 666 3 0x60e5873a 884763 www-data 666 3 0xc314ead5 917532 www-data 666 3 0xa9442db3 950301 www-data 666 3 0xab52b406 983070 www-data 666 3 0x26551e77 1015839 www-data 666 3 0xd3fc66ce 1048608 www-data 666 3 0xf16f6100 1081377 www-data 666 3 0x25b03396 1114146 www-data 666 3 0x2ba70226 1146915 www-data 666 3 0xf10f5538 1179684 www-data 666 3 0x365b4f31 1212453 www-data 666 3 0x6a8d5d1a 1245222 www-data 666 3 0x6e2a442c 1277991 www-data 666 3 0x8264f29c 1310760 www-data 666 3 0x0973ce0a 1343529 www-data 666 3 0x043b027a 1376298 www-data 666 3 0xa250f4df 1409067 www-data 666 3 0x7c984307 1441836 www-data 666 3 0xebf7e556 1474605 www-data 666 3 0xf02169f2 1507374 www-data 666 3 0xc27603cd 1540143 www-data 666 3 0xd3c3c612 1572912 www-data 666 3 0x9cc25ec5 1605681 www-data 666 3 0x04629384 1638450 www-data 666 3 0xd5e85eaf 1671219 www-data 666 3 0x22687898 1703988 www-data 666 3 0xb40f30b2 1736757 www-data 666 3 0xe31f7de7 1769526 www-data 666 3 0x07aecb8a 1802295 www-data 666 3 0xcf62bda7 1835064 www-data 666 3 0x2c784309 1867833 www-data 666 3 0x0e8d31f2 1900602 www-data 666 3 0x4fd013e4 1933371 www-data 666 3 0xb44b91b2 1966140 www-data 666 3 0x9b85d8b5 1998909 www-data 666 3 0xe8994ff1 2031678 www-data 666 3 0xdeb8161d 2064447 www-data 666 3 0xa3c4fb43 2097216 www-data 666 3 0x2ba5a06e 2129985 www-data 666 3 0xd33d736a 2162754 www-data 666 3 0x113edfef 2195523 www-data 666 3 0x36a1dc1a 2228292 www-data 666 3 0xf12befbb 2261061 www-data 666 3 0x69911364 2293830 www-data 666 3 0xa9140a5f 2326599 www-data 666 3 0x03c74337 2359368 www-data 666 3 0x6a014e24 2392137 www-data 666 3 0xbc24ca33 2424906 www-data 666 3 0xf114402a 2457675 www-data 666 3 0x2a36aca6 2490444 www-data 666 3 0x2e918a24 2523213 www-data 666 3 0xe4a3c031 2555982 www-data 666 3 0x0c841d24 2588751 www-data 666 3 0xb380209d 2621520 www-data 666 3 0x6149871e 2654289 www-data 666 3 root@matw:~# sysctl -a |grep kernel\.sem kernel.sem = 250 32000 32 128 kernel.sem_next_id = -1
So this is my system error? Not TYPO3?
Updated by Markus Klein over 9 years ago
What server do you use?
Is this a shared hosting or so?
You have a 250 limit for the number of semaphores.
TYPO3 currently does not remove them, because we have no "controlled" environment - in the sense that a request does not know about other requests - and we don't want to have race conditions because of removing a semaphore too early.
Thanks for you report, btw, because this is very valuable feedback as the Locking API was changed not long ago and we really need field experience.
Currently the semaphore locking is the preferred method, if available. But if it turns out that it causes too much trouble, we might change that again.
Updated by Markus Klein over 9 years ago
- Category set to Frontend
- Status changed from New to Accepted
- Assignee set to Markus Klein
- Priority changed from Should have to Must have
- Target version set to 7.2 (Frontend)
- Complexity set to medium
- Sprint Focus set to Stabilization Sprint
After checking the code, I see that we potentially use quite a lot of semaphores.
I'll try to come up with a patch to limit the number.
Updated by Mateusz Wojtuła over 9 years ago
Thanks, for this information. So what can I do when this error appears again?
This site is on DigitalOcean with the smallest virtual server (512MB ram, 1 core, 20 GB SSD).
If you want to check it you can use this link to register and get 10$ for free https://www.digitalocean.com/?refcode=96665686914b
Updated by Markus Klein over 9 years ago
More infos about the numbers.
kernel.sem = 32000 1024000000 500 32000 kernel.sem = 250 32000 32 128
The order is: SEMMSL, SEMMNS, SEMOPM, and SEMMNI
SEMMSL: maximum number of semaphores per semaphore set
SEMMNS: total number of semaphores (not semaphore sets) for the entire Linux system
SEMOPM: maximum number of semaphore operations that can be performed per semop(2) system call
SEMMNI: maximum number of semaphore sets for the entire Linux system
Description of the settings eg. http://www.puschitz.com/TuningLinuxForOracle.shtml#SettingSemaphores
It can clearly be seen that the second set of numbers limits the total number of semaphores on the whole system to 128(!)
Updated by Markus Klein over 9 years ago
So what can I do when this error appears again?
Search for how you can increase the maximum number of semaphores or how you can release existing ones on the web.
Updated by Markus Klein over 9 years ago
After digging around in the code I realize:
We use a lot of information to generate a unique key for the lock. So we actually have a key per variation of a page. (md5 of serialize of id, type,gr_list, MP, cHash, startPage)
\TYPO3\CMS\Frontend\Controller\TypoScriptFrontendController::createHashBase(TRUE)
This means we spam the system with a semaphore per variation, but OS might limit us to 128 semaphores!
This approach is brutally wrong, even if we don't use semaphores but file locks, which need a file per variation then.
We need to define a better strategy.
Idea: Use a single lock to access some shared resource, which keeps track of the process currently rendering a page.
Updated by Markus Klein over 9 years ago
Current procedure:
- check cache, if empty proceed with
- generate hash
- get lock for this hash
- generate page
- write page to cache
- release lock
This creates a lock per hash, which might be manifold.
I propose the following procedure:
- check cache, if empty proceed with
- generate hash
- get lock for cache access (1 unique lock per instance)
- write to cache that we're working on the content (set lock for this hash)
- release the lock
- generate page
- get lock again
- remove lock for the hash
- write page to cache
- release lock
This is some sort of simple reader/writer pattern, where we allow inconsistent reads, while generating the page.
Regarding the Semphore locking in general:
We need to change the semaphore keys. Important to say that the keys must be unique per instance, otherwise two T3 instances on the same server might block each other (even if only for a very short time).
The best way to generate a key (that also does not collide with OS keys) is to use ftok(filename, projId), which needs a filename.
We could use a fixed filename, but that would limit the possible range of projId to 256 (8bit), which is a no-go, since we can't map arbitrary $subject to 256 projIds in a sane way.
Therefore we propose to create a file in typo3temp/locks/sem_<md5 of $subject> used for ftok.
This way one also has a reference how many semaphores have been created in the system by the instance, without looking into command line tools.
Updated by Gerrit Code Review over 9 years ago
- Status changed from Accepted to Under Review
Patch set 1 for branch master of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at http://review.typo3.org/38840
Updated by Gerrit Code Review over 9 years ago
Patch set 2 for branch master of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at http://review.typo3.org/38840
Updated by Gerrit Code Review over 9 years ago
Patch set 3 for branch master of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at http://review.typo3.org/38840
Updated by Gerrit Code Review over 9 years ago
Patch set 4 for branch master of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at http://review.typo3.org/38840
Updated by Gerrit Code Review over 9 years ago
Patch set 5 for branch master of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at http://review.typo3.org/38840
Updated by Gerrit Code Review over 9 years ago
Patch set 6 for branch master of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at http://review.typo3.org/38840
Updated by Gerrit Code Review over 9 years ago
Patch set 7 for branch master of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at http://review.typo3.org/38840
Updated by Markus Klein over 9 years ago
Testing instructions:
Insert a sleep(10) in the tsfe::generate_pre...() function, right after the release of the lock. This allows you to see the "Page is generated" message" if you visit the page with two browsers at the same time. Clear the cache first and logout from BE.
At the same time you can watch the lock files come and go in typo3temp/locks.
Updated by Gerrit Code Review over 9 years ago
Patch set 8 for branch master of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at http://review.typo3.org/38840
Updated by Gerrit Code Review over 9 years ago
Patch set 9 for branch master of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at http://review.typo3.org/38840
Updated by Gerrit Code Review over 9 years ago
Patch set 10 for branch master of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at http://review.typo3.org/38840
Updated by Andreas Allacher over 9 years ago
Test-script to produce some locking states, this would result in deadlock with patchset 8 because of missing no block option of patchset 9
#!/bin/sh ./typo3cms cache:flushgroups --groups=pages time wget --output-document=index_temp.1.html --content-on-error "http://localhost/" & time wget --output-document=index_temp.2.html --content-on-error "http://localhost/" & time wget --output-document=index_temp.3.html --content-on-error "http://localhost/" & time wget --output-document=index_temp.4.html --content-on-error "http://localhost/" & time wget --output-document=index_temp.5.html --content-on-error "http://localhost/" & time wget --output-document=index_temp.6.html --content-on-error "http://localhost/" & time wget --output-document=index_temp.7.html --content-on-error "http://localhost/" & time wget --output-document=index_temp.8.html --content-on-error "http://localhost/" & time wget --output-document=index_temp.9.html --content-on-error "http://localhost/" & time wget --output-document=index_temp.10.html --content-on-error "http://localhost/" & time wget --output-document=index_temp.11.html --content-on-error "http://localhost/" & time wget --output-document=index_temp.12.html --content-on-error "http://localhost/" &
Regarding usleep in locking part it according to times it seems to be better to increase it (at least for me). But I think it is also depends on how much memory/resources are available
With 5us I am about 1second slower per request than with e.g. 5000 but I guess that really depends on the system usage.
But I think 5us might really be to little 5000 would at least be 5 milliseconds.
Event better (but not much normally than 5000) seem to be 50000 or 100000 but might also be system related.
Maybe others can do some tests too?
Updated by Andreas Allacher over 9 years ago
- File lock_test.sh lock_test.sh added
Updated by Gerrit Code Review over 9 years ago
Patch set 11 for branch master of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at http://review.typo3.org/38840
Updated by Gerrit Code Review over 9 years ago
Patch set 12 for branch master of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at http://review.typo3.org/38840
Updated by Markus Klein over 9 years ago
- Status changed from Under Review to Resolved
- % Done changed from 0 to 100
Applied in changeset a1ed7cefc902cb9bd0e0451c550fe92ea3302033.
Updated by Riccardo De Contardi about 7 years ago
- Status changed from Resolved to Closed