Project

General

Profile

Actions

Bug #15244

closed

"Page is being generated" is bad for search engines

Added by Dmitry Dulepov almost 19 years ago. Updated over 18 years ago.

Status:
Closed
Priority:
Should have
Category:
Frontend
Target version:
-
Start date:
2005-11-29
Due date:
% Done:

0%

Estimated time:
TYPO3 Version:
3.8.0
PHP Version:
4
Tags:
Complexity:
Is Regression:
Sprint Focus:

Description

There is nothing in such response that tells to search engines about problem on the server. Thus search engine indexes this text as proper page content.

The following is proposed to "optimize" this message for search engines:
- return "504 Service unavailable" HTTP status instead of "200 OK"
- set "Retry-after" HTTP header to current time + 1 hour to prompt search engine to revisit this page again when load goes away
- set other appropriate HTTP headers (like "Pragma") to prevent caching of this page
- make sure that this page is longer than 512 bytes to show it in MSIE properly (otherwise MSIE will show its own error page)

You can assign this bug to me, I will do final research from SEO point of view and handle it (I have a CVS account for typo3).

(issue imported from #M1947)

Actions #1

Updated by Martin Kutschker almost 19 years ago

Pragma: no-cache
Cache-control: no-cache

Actions #2

Updated by Andreas Balzer almost 19 years ago

"set "Retry-after" HTTP header to current time + 1 hour to prompt search engine to revisit this page again when load goes away"
it would be better, if it would be less than 1 hour, because at large news pages the contents could have changed..
Maybe a configuration anywhere..

Actions #3

Updated by Dmitry Dulepov almost 19 years ago

I do not thini that configuration is needed for this feature because it happens quite rarely.

This message usually appears when system is overloaded. I "selected" one hour because it should be enough for system to "cool down". I do not think that news pages need smaller time interval. This is simply because their news become obsolete too fast. It does not really matter what news will be indexed, the main purpose is that such site is indexed. If time interval is too small and site is still overloaded, search engine may decide that problem is permanent and simply put this site to stop list for a month. Google does that...

Actions #4

Updated by Sebastian Kurfuerst almost 19 years ago

+1 for no config option.

Actions #5

Updated by Andreas Balzer almost 19 years ago

"I do not thini that configuration is needed for this feature because it happens quite rarely.

This message usually appears when system is overloaded."
Well, this depends on the ability of the server. At the server, where my school is running Typo3, this message appears about 10 times an hour.. Then it would be nice too, if the administrator could set, what Typo2 sends out. Maybe there is a mirror server, and the user could be redirected.. just as an idea

Actions #6

Updated by Dmitry Dulepov almost 19 years ago

Well, this is a different story. I think you can make a feature request for it. This bug is only about optimizing "Page is being generated" for search engines.

Btw, you can try eAccelerator or Turck mmcache in your school for better typo3 performance.

Actions #7

Updated by Michael Stucki almost 19 years ago

This is a feature that can't be changed easily. I'll try to explain how page caching works in a few words:

Condition: the page is not in the cache yet.

1. User A requests page 123 and the rendering starts
2. First of all, TYPO3 stores the "page is being generated" message in the cache_pages table to make sure nobody else who requests the page triggers the whole rendering again.
3. The page is still being generated, but now User B is requesting page 123 too.
4. He gets the "page is being generated" message
5. The rendering of user A (still waiting) is finished, the result overwrites the temporary data in cache_pages, and now user A gets the page as requested.
6. After 3 seconds, the website of user B was reloaded and he sees the correct page as well.

This is the usual case, the way it should work. If you say that the error occurs very often, then I expect that you have some problem with your page caching because the error also means that the requested page was not in the cache yet.

Do you really have so many pages on this website, or could it be there is a caching problem...?

After all, it's not a TYPO3 bug.

Actions #8

Updated by Andreas Balzer almost 19 years ago

Hi Michael Stucki!
Thanks for the anwert. :)
It explains a lot. We have about 350 sites and we had to clear the cache often in the past. But we also could have a cache problem. I don't know where the cache is being cached, but our temp files in Typo3 went up to 15 GB last summer.. quite strange, but i didn't want to publish it as a bug.. Now everything should be fine..

Well, but to come back to topic: If the page is... message could be displayed in the FE user language, it would be nice.. so i think someone could publish this as a feature wish?

Actions #9

Updated by Michael Stucki almost 19 years ago

Well, but to come back to topic: If the page is... message could be
displayed in the FE user language, it would be nice.. so i think
someone could publish this as a feature wish?

This is already possible, see bug #14333.

Actions #10

Updated by Dmitry Dulepov almost 19 years ago

Michael, thanks for explaining. I researched this yestarday and I do not see how we can easily send correct status/headers unless we modify cache_pages table to include column for additional headers. Though it sounds logical (headers are part of the page anyway!) it is a big change.

One more thing just came to my mind. If a USER-type extension adds headers to th page while it generates content, they will not be sent with a cached version of the page. This is not good probably. Even extension adds them to config.additionalHeaders, they will not be there when cached version of the page is sent.

Actions

Also available in: Atom PDF