Project

General

Profile

Actions

Bug #17636

closed

config.prefixLocalAnchors = all destroys links in fe

Added by Alexander Bohndorf over 16 years ago. Updated over 15 years ago.

Status:
Closed
Priority:
Should have
Category:
Communication
Target version:
-
Start date:
2007-09-28
Due date:
% Done:

0%

Estimated time:
TYPO3 Version:
4.1
PHP Version:
4.3
Tags:
Complexity:
Is Regression:
Sprint Focus:

Description

This TS Setup config.prefixLocalAnchors = all should replace all links like <a href="#" into <a href="[path-to-url]#".

Because of an error in a regular expression this function can result in destroying all this links in fe.

The error is in /typo3/sysext/cms/tslib/class.tslib_fe.php in line 3667, function prefixLocalAnchorsWithScript()

Patch:
Replace the line with the following:
$this->content = preg_replace('/(<(a|area).*\?href=")(#[^"]*")/i','$1' . htmlspecialchars($scriptPath) . '$3',$this->content);

In the regular expression the must an backslash be added before the question mark.

TYPO3 4.1.2
(issue imported from #M6415)


Files

Actions #1

Updated by Jan-Erik Revsbech over 16 years ago

This bus is actually more serious. There is an error in the Regular expression, In my oppinion, the ? before href should not be there instead the expression should be .+ and not .*? (Correct me if I'm wrong).

What is even worse, is that in php 5.2 preg_replace will return empty string if it encounters an error (Like backtrack_limit reached). Since the function prefixLocalAnchorsWithScript runs preg_replace directly on $this->content without any error checking, this will result in a blank page.

The attached patch (Which is php 5.2 only), corrects to regular expression syntax, and introduces some errorchecking.

Actions #2

Updated by Ingo Renner over 16 years ago

I can confirm this error. I once had it on a larger site when rendering the sitemap. Blank page out of nowhere... will give it a shot!

Actions #3

Updated by Dmitry Dulepov over 16 years ago

Proposed solution fails with this html:

<a href="#x1"> href="#x2">

It will replace second href value instead of the first.

Actions #4

Updated by Dmitry Dulepov over 16 years ago

By the way, patch to core list has other regular expression:

/(<(a|area).+href=")(#[^"]*")/i

My earlier note to the bug uses that regexp for testing

Actions #5

Updated by Franz Koch about 16 years ago

Hi,
in my case, neither is working. But the attached patch will at least show content. When preg_replace is making problems, why not simply use 'str_replace'?

str_replace(' href="#',' href="'.htmlspecialchars($scriptPath).'#',$this->content);

I know it's not only replacing the anchor inside <a>-tags, but currently I can't think of a case, where the href part must not be changed or is not inside a <a>-tag. And because the quote sign is normally written &quot; in regular text output, that shouldn't be a problem either.

Actions #6

Updated by Franz Koch about 16 years ago

Just added a modified patch, that uses my mentioned str_replace method as fallback, when the preg_replace failed. It'll be really nice when this bug could be fixed - it again cost me 2 hour to find out that my problem was related to this bug :(

Actions #7

Updated by Christian Karsten almost 16 years ago

The new regexp has two problems:

1) .+ is greedy, so the <a is matched to the first <a in the content and the href="# to the last href="# - a lot of local anchors can hide in the .+ and will not be prefixed. And the last href="#, which will be replaced, could even be #PCDATA. The nongreedy version from the original code would be better, or checking for the closing > with [^>]*

2) <(a|area).+ matches anything starting with <a (e.g. <abbr>,<applet>), there should be a check for whitespace after the name of the element.

Together with the fix for issue 0008240 we get

$this->content = preg_replace('/(<(a|area)\s[^>]*href=")(#[^"]*")/i','${1}' . htmlspecialchars($scriptPath) . '$3',$this->content);

Actions #8

Updated by Johannes Beranek almost 16 years ago

$pattern = '/(<(a|area)\s+(([^h>])|(h[^r>])|(hr[^e>])|(hre[^f>])|(href[^=>]))*href=")(#[^"]*")/i';
$this->content = preg_replace($pattern, '$1'.htmlspecialchars($scriptPath).'$3', $this->content);

how about using this one? [^>]*href won't work, logically - because [^>] will consume href as well, if you don't specify ungreedy mode...

Actions #9

Updated by Christian Karsten almost 16 years ago

No, [^>]* works. Greediness does not mean that the rest of the pattern will be ignored. When href="#[^"]*" can be found before the closing bracket > it will be matched.

Actions #10

Updated by Johannes Beranek almost 16 years ago

Sorry, didn't know that would work with preg_replace - maybe i'll report that to php bug list, as greediness should, in this case mean, 'take as many characters as possible not matching ">", at least 0 and at most infinite, and then "href"' - which in theory should consume "href" as well, as it is made up by 4 characters not matching ">" - if that wouldn't be true, where was the meaning of the ungreedy mode? In any case, as your part of the expression works, i think its shorter and thus more efficient than my suggestion.

Actions #11

Updated by Matthew Kennewell almost 16 years ago

I was working on a TYPO3 website today and came across this error before knowing this bug report existed; pattern in function prefixLocalAnchorsWithScript() fails when using SSD & no alias set for page.

Perhaps consider keeping the original pattern in function prefixLocalAnchorsWithScript() with a minor change shown here:

$this->content = preg_replace('/(<(a|area).*?href=")(#[^"]*")/i','$1' . htmlspecialchars($scriptPath) . '$3',$this->content);

to:

$this->content = preg_replace('/(<(a|area).*?href=")(#[^"]*")/i','${1}' . htmlspecialchars($scriptPath) . '$3',$this->content);

The only change is related to the first backreference, $1 should be ${1}

Reason from php manual; When working with a replacement pattern where a 'backreference' is immediately followed by another number (i.e.: placing a literal number immediately after a matched pattern), you cannot use the familiar \\1 notation for your backreference. \\11, for example, would confuse preg_replace() since it does not know whether you want the \\1 backreference followed by a literal 1, or the \\11 backreference followed by nothing. In this case the solution is to use \${1}1. This creates an isolated $1 backreference, leaving the 1 as a literal.

Example:
$this->content = '<div id="footer"><a href="#top">to top ^ </a></div>';
$scriptPath = '23.html';

output using $1 in pattern = <div id="footer">3.html#top">to top ^ </a></div>

output with ${1} in pattern = <div id="footer"><b href="23.html#top">to top ^ </a></div>

(in the last line the <b should be <a had to write it his way so i could display all html as the 'a' tag gets stripped if in correct form)

Reasoning: When combining 1st & 2nd pieces of replacement string, i.e. joining backreference $1 with 23.html php thinks that the backreference is actually $12 (2 being the first literal number of $scriptPath that immediately follows $1) and therefore fails the first part of the building of the replacement string, therefore leaving only the remaining part of $scriptPath '3.html' to be combined with $3

Actions #12

Updated by Matthew Kennewell almost 16 years ago

looks this bug has been fixed in TYPO3 v4.2.0

file /typo3/sysext/cms/tslib/class.tslib_fe.php
function prefixLocalAnchorsWithScript()

curly braces added around back references ${1} and ${3}

function prefixLocalAnchorsWithScript()    {
$scriptPath = substr(t3lib_div::getIndpEnv('TYPO3_REQUEST_URL'),strlen(t3lib_div::getIndpEnv('TYPO3_SITE_URL')));
$this->content = preg_replace('/(<(a|area).*?href=")(#[^"]*")/i','${1}' . htmlspecialchars($scriptPath) . '${3}',$this->content);
}
Actions #13

Updated by Jan-Henrik Hempel almost 16 years ago

We at form4 had the problem, that a sitemap with a huge amount of pages made the here described error. We also had a problem with the solution of Matt K and found another one.

TYPO3 4.1.2 and maybe 4.1.6. too.

function prefixLocalAnchorsWithScript()    {
$scriptPath = substr(t3lib_div::getIndpEnv('TYPO3_REQUEST_URL'),strlen(t3lib_div::getIndpEnv('TYPO3_SITE_URL')));
// form4 Fix
$this->content = preg_replace('/(<(a|area)[^>]*href=")(#[^"]*")/i','$1' . htmlspecialchars($scriptPath) . '$3',$this->content);
// original version in t4.1.2
#$this->content = preg_replace('/(<(a|area).*?href=")(#[^"]*")/i','$1' . htmlspecialchars($scriptPath) . '$3',$this->content);
}
Actions #14

Updated by Benni Mack almost 16 years ago

Hey Jan-Henrik,

do you want to send this fix to the core list in form of a patch?

Would be great!

Actions #15

Updated by Olivier Schopfer over 15 years ago

I can confirm the fact that pages crash, when there are too many links in them (it took me days to identify where the problem was).

Indeed, Jan-Henrik Hempel's fix is working. As this is in fact a quite important bug, how can me make sure that the correction is submitted to the core team?

Actions #16

Updated by Daniel Hahler over 15 years ago

I'll add a patch against TYPO3_4-2, which incorporates all things mentioned in this thread, plus:
- use REQUEST_URI instead of building $scriptPath.. the URL needs to be the same, and that's what REQUEST_URI is meant for.
- also handle href='#foo' (using single quotes)

Diff:
- $this->content = preg_replace('/(<(a|area).*?href=")(#[^"]*")/i','${1}' . htmlspecialchars($scriptPath) . '${3}',$this->content);
+ $this->content = preg_replace('/(<(?:a|area)\s[^>]*?\bhref\s*=\s*("|\'))(#[^\2]*\2)/is','${1}' . htmlspecialchars(t3lib_div::getIndpEnv('REQUEST_URI')) . '$3',$this->content);

(I'll attach it as tslib_prefixLocalAnchorsWithScript_requesturi_and_single_quotes_r4382.diff, too)

Actions #17

Updated by Daniel Hahler over 15 years ago

Additionally, "$(\d)" should get replace with "\$($1)" in t3lib_div::getIndpEnv('REQUEST_URI'), in case there's e.g. $2 in the request uri (e.g. /?foo=$2).

I'm updating the patch.

Actions #18

Updated by Daniel Hahler over 15 years ago

Well, using REQUEST_URI appears to be the right way, but it seems like the page (or relevant part) gets cached only based on "scriptPath", without GET params and therefore you end up with a wrong URL path in the links.
This is with config.prefixLocalAnchors=all, which I've now changed to "output" for the patch to work correctly.
Of course, "all" should still work, but then the whole REQUEST_URI cannot be used.
On the other hand, a link on "/foo?bar=1" to "/foo#anchor" will cause a page reload and in most cases also a different page output.
Does caching depend here on cHash?

Actions #19

Updated by Dmitry Dulepov over 15 years ago

--removed--, need more research on this...

Actions

Also available in: Atom PDF