Bug #18754

improved 404 pagenotfound_handling not working for certain requested URLs/resources

Added by Matthew Kennewell about 5 years ago. Updated 14 days ago.

Status:New Start date:2008-05-06
Priority:Should have Due date:
Assignee:- % Done:

0%

Category:Communication
Target version:4.5.12
TYPO3 Version: Complexity:
PHP Version:4.3
Votes: 1 (View)

Description

When using default TYPO3 .htaccess file and with or without 'simulate static documents' it appears that 404 pagenotfound_handling works only for requested files:

- that have .html as the file suffix
- and the file being called is requested from the root of a typo3 site, i.e. www.domain.com.au/file.html

If 'file.html' exists, page is shown

If 'wrongfile.html' does not exist then correct 404 handling takes place
correctly, due to function '$this->checkAndSetAlias()'

The '404 pagenotfound_handling' feature fails to show the 404 headers for the following requested resources:

- www.domain.com.au/file.htm
- www.domain.com.au/file.pdf

- www.domain.com.au/folder/
- www.domain.com.au/folder/file.html
- www.domain.com.au/folder/file.pdf

When the above requested resources fail, the browser is given a 200 OK http header and is shown the home page of the website with the requested resources URL remaining in the browsers address bar. This could be due to $this->id being 'false' and then $this->id is set to '0' in function 'setIDfromArgV()'.

My suggested code to patch class.tslib_fe.php works on the premise that $this->TYPO3_CONF_VARS['FE']['pageNotFound_handling'] value is TRUE, but perhaps the TYPO3 404 handling should still work when $this->TYPO3_CONF_VARS['FE']['pageNotFound_handling'] is FALSE and therefore still show 404 headers and redirect to home page/ root page of website.

Please see attached file(s) class.tslib_fe.modified.php.txt & class.tslib_fe.orig.php.txt to review suggested code as a basis idea working towards possible patching of /typo3/sysext/cms/tslib/class.tslib_fe.php

(issue imported from #M8343)

class.tslib_fe.modified.php.txt (162.3 kB) Administrator Admin, 2008-05-06 17:00

class.tslib_fe.orig.php.txt (161.5 kB) Administrator Admin, 2008-05-06 17:01

class.tslib_fe.modified.php__updated.txt (162.4 kB) Administrator Admin, 2008-05-22 04:04

effects_on_this-id_using_default_class.tslib_fe.php.txt (2.9 kB) Administrator Admin, 2008-05-29 14:36

effects_on_this-id_using_modified_class.tslib_fe.php.txt (2.2 kB) Administrator Admin, 2008-05-29 14:36

History

Updated by Matthew Kennewell about 5 years ago

Additional information can be found in a post in the TYPO3 mailing list typo3.dev

Subject "An idea to further process ' page not found ' 404handling"

Originally posted "Tuesday, 29 April 2008"

Updated by Olivier Dobberkau about 5 years ago

We have experienced a massive Load due of this handling behaviour.

Updated by Matthew Kennewell almost 6 years ago

the attached file, class.tslib_fe.modified.php__updated.txt , has a line added in the following array, which was not in file class.tslib_fe.modified.php.txt

This added line, starting with 5 => , is required to for outputting an error message when suggested new function checkAndSetPageNotFound() sets $this->pageNotFound = 5

the following code block is from file /typo3/sysext/cms/tslib/class.tslib_fe.php around line 884-891
----------------------------------------

if ($this->pageNotFound && $this->TYPO3_CONF_VARS['FE']['pageNotFound_handling'])    {
$pNotFoundMsg = array(
1 => 'ID was not an accessible page',
2 => 'Subsection was found and not accessible',
3 => 'ID was outside the domain',
4 => 'The requested page alias does not exist'
5 => 'The requested page or file resource does not exist'
);
----------------------------------------

Updated by Matthew Kennewell almost 5 years ago

Well I have completed some further research into SSD & 404 handling and it seems that the code suggestions I have made to date may not work as expected, (produce correct 404 headers), so i have come up with the following as another suggested fix for this bug. This suggestion may not consider all things required, its just a step towards fixing bug.

In TYPO3 v4.1.6, file /typo3/sysext/cms/tslib/class.tslib_fe.php , in function fetch_the_id(), approx line 861:
replace:
if (!$this->id) {
with:
if (!$this->id && !$this->TYPO3_CONF_VARS['FE']['pageNotFound_handling']) {

And the same potentially goes for:

In TYPO3 v4.2.0, file /typo3/sysext/cms/tslib/class.tslib_fe.php , in function fetch_the_id(), approx line 929:
replace:
if (!$this->id) {
with:
if (!$this->id && !$this->TYPO3_CONF_VARS['FE']['pageNotFound_handling']) {

Summary:
If 'pageNotFound_handling' set then $this->id maintains its setting of zero, then when $this->id is processed in function getPageAndRootline() and still no page exists then there is a call to $this->pageNotFoundAndExit()

If 'pageNotFound_handling' not set then $this->id will be set to "the id was not previously set, set it to the id of the domain" or "the first 'visible' page in that domain", this is typically the 'home page'.

note:
$this->id was set to zero from this function call $this->setIDfromArgV() in function determineId()

Updated by Matthew Kennewell almost 5 years ago

Whoops: just found out that if you call just the domain of your SSD website this loads the page set for pageNotFound handling.

Sorry guys & gals for the err on my part...

Therefore the above suggested line change needs to consider if NO SITE_SCRIPT exists. here is a new suggested line of code to fix SSD bug.

if (!$this->id && !($this->TYPO3_CONF_VARS['FE']['pageNotFound_handling'] && t3lib_div::getIndpEnv('TYPO3_SITE_SCRIPT'))) {

Updated by Matthew Kennewell almost 5 years ago

The previous submitted note with code suggestion, (of mine), did not consider if the requested URL was www.domain.com.au/index.php

So here is an update to the suggested line of code to that supercedes any previous code suggestions of mmine to work towards fixing 404 handling when using SSD:

*
if (!$this->id && (t3lib_div::getIndpEnv('TYPO3_SITE_SCRIPT')=='index.php' || !(t3lib_div::getIndpEnv('TYPO3_SITE_SCRIPT') && $this->TYPO3_CONF_VARS['FE']['pageNotFound_handling']))) { *

I placed this line of code in my local copy of file /typo3/sysext/cms/tslib/class.tslib_fe.php around lines 861-873 in TYPO3v4.1.6 AND around lines 929-948 in TYPO3 v4.2.0

It worked for the following requested resources, when SSD set, config.simulateStaticDocuments = 1

- www.domain.com/ - showed root page
- www.domain.com/index.php - showed root page
- www.domain.com/contact_us.html - showed the contact us page

the follwing resources showed: 404 headers, & showed the page content set for 404 handling
- www.domain.com/wrong_alias.html
- www.domain.com/wrong_file.pdf
- www.domain.com/wrong_folder/
- www.domain.com/wrong_folder/wrong_file.pdf

Please note: The line of code suggested above has not been tested with extensions realURL & coolURI and is likely to break them. Perhaps a way to support either the setting of SSD or realURL or CoolURI etc would be to create an 'if statement' associated with the suggested line of code above to check if config.simulateStaticDocuments set in main ts template, along the lines of:

if config.simulateStaticDocuments = 0 , if this statement is true then use original line of code " if (!$this->id) { "

if config.simulateStaticDocuments = 1 , if this statement is true then use suggested line of code above

BUT:
I tried to configure an if statement to cover this checking of simulateStaticDocuments myself but I couldn't access the websites main ts template 'setup' values, specifically '$this->config['config']['simulateStaticDocuments']', while in function fetch-the_id() in class.tslib_fe.php but to no avail.

Some research later, I now believe that there's no access to TypoScript values until after the ID of the requested page is known, (which now makes sense since ts is unique to a 'domain' and its page tree). I guess you could have 2 different websites set up in the one install of TYPO3 on different domains with one configured for SSD & the other set for realURL, m'mmm.

From my understanding, typoscript cannot be read into a config array until the requested page ID is known.

In file /typo3/sysext/cms/tslib/index_ts.php the call to $TSFE->determineId() i think takes care of working out the ID and inside this function is where the above line of code is suggested to replace the existing line of code, (line number above). After $TSFE->determineId() is processed and an ID is known then functions inside index_ts.php continue to execute and i think that the ts for the resolved domain & page is read into a config array from this function call $TSFE->getConfigArray()

Additional info: created the following 2x files, (see attched); both with basic list of function flow with respect to their effects on the value of $this->id, from resource request to index.php through to where it is determined to exit due to requested resource being false and therefore initiate pageNotFound, (404), or continue with resolving page ID if requested resource is true etc

- effects_on_this-id_using_default_class.tslib_fe.php.txt

- effects_on_this-id_using_modified_class.tslib_fe.php.txt

From here I dont know enough of the TYPO3 API, SSD, RealURL & CoolURI to know how to suggest a way to use my suggested code to fix 404 handling when SSD set without breaking other SEF extensions.

Hope this information will be useful.

Cheers, Matt

Updated by Andi Phringer over 4 years ago

First of all thank you so much, Matt. I was really stuck with this problem for a long time and your approach worked for me as well. I just want to add a small remark as others might be facing this problem as well.

If you want to get the same working with Realurl you should comment out the following line in your realurl configuration:
// 'postVarSet_failureMode'=>'redirect_goodUpperDir',

Else the wrong page names on the root level of the website will still point to the Homepage.

Kind Regards
Andi

Updated by Thomas Deinhamer over 3 years ago

Will this be included into 4.3? I'd really appreciate it, as it makes serving error pages a LOT easier, and also SEO will get a true boost! Thanks so much!
PS: Does this also work with other page types and language ids? As I can remember we had a ot of troubles with L other than 0 and type other than 0. Can someone confirm this further?

Updated by Matthew Kennewell over 2 years ago

Hi,

Will this be included in TYPO3 version 4.5 with Long Term Support or any other upcoming TYPO3 release?

Cheers

Updated by Björn Paulsen over 1 year ago

  • Target version changed from -1 to 4.5.9

This Bug is also in Typo3 4.5 LTS.

I solve this Bug very esay:

Function "fetch_the_id()"
typo3/sysext/cms/tslib/class.tslib_fe.php:914

insert this Code:

if( ($this->id == 0) && ($this->siteScript <> false))
$this->pageNotFound = 1;

after:

$this->idParts = explode(',',$this->id);

// Splitting by a '+' sign - used for base64/md5 methods of parameter encryption for simulate static documents.
list($pgID,$SSD_p)=explode('+',$this->idParts0,2);
if ($SSD_p) { $this->idPartsAnalyze($SSD_p); }
$this->id = $pgID; // Set id
// If $this->id is a string, it's an alias
$this->checkAndSetAlias();
  • insert here ***

and no Problems I found, all 404 Sites comes up.

Updated by Ernesto Baschny over 1 year ago

  • Target version changed from 4.5.9 to 4.5.12

Updated by Michael Cannon about 1 year ago

I confirm the problem and suggest fix in TYPO3 4.5.4.

Updated by Matthew Kennewell 9 months ago

Hi,

I recently updated to the TYPO3 source v4.5.19 and I see the 404 headers and page handling error referred to in this bug report still occurs, when using SSD.

Is there any chance that this could be reviewed and the core file class.tslib_fe.php be patched/amended to have 404 error handing working correctly?

Thanks in advance.

Matthew

Also available in: Atom PDF