Bug #10907
closedinvalid external url not detected
100%
Description
First, completely invalid urls are detected, e.g. crap.google.com
However 404 and 403 are not caught, e.g.:
http://error.philippgampe.info
http://philippgampe.info/fileadmin/
http://google.com/test
Those links are broken too and should be detected.
Applying the patch from #10589 does not help.
Files
Updated by Pierre Boivin almost 14 years ago
- Status changed from New to Needs Feedback
- % Done changed from 0 to 100
Corrected by adding lines 62 to 66 of "class.tx_linkvalidator_linktypes_external.php"
Updated by Pierre Boivin almost 14 years ago
- Status changed from Needs Feedback to Resolved
Updated by Philipp Gampe almost 14 years ago
Sorry for warming this up, but some are still not catched.
http://philippgampe.info/fileadmin/ (403) is detected.
http://crap.google.com/ (network error) is detected.
But http://error.philippgampe.info (404) is not detected.
This one is not market as broken either:
http://typo3.org/404
Nor
http://google.com/test/
Best regards
Phil
Updated by Patrick Gaumond almost 14 years ago
- Status changed from Resolved to Accepted
Updated by Mehdi Guermazi almost 14 years ago
I tested on a trunk 4.5 beta2 with the latest trunk and everything working fine.
I tried:
http://error.philippgampe.info/ External Link returned HTTP error code (404)
http://www.google.com/test External Link returned HTTP error code (404)
http://www.google.com/test External Link returned HTTP error code (404)
the three links where detected as 404...
Updated by Simon Ouellet almost 14 years ago
worksforme :)
tested with 4.4.5 trunk and linkvalidator trunk...
Updated by Philipp Gampe almost 14 years ago
ok, tracked it down to redirecting...
http://google.com/test is not detected, but
http://www.google.com/test is.
This might be a core thing :(
Env:
Ubuntu 10.04 with lampp (XAMPP) TYPO3 4.5trunk + linkvalidator trunk
Updated by Simon Ouellet almost 14 years ago
ok...
a possible workaround could be to call t3lib_div::getUrl while the returned header = 301
class.tx_linkvalidator_linktypes_external.php (line 55)
$content = t3lib_div::getURL($url, 1, FALSE, $report); while ($report['http_code'] == '301') { $isCodeRedirect = preg_match('/Location: (.*)/', $content, $location); if (isset($location[1])) { $content = t3lib_div::getURL($location[1], 1, FALSE, $report); } } $ret = 1;
or just test it one time :
$content = t3lib_div::getURL($url, 1, FALSE, $report); $isCodeRedirect = preg_match('/Location: (.*)/', $content, $location); if (isset($location[1])) { $content = t3lib_div::getURL($location[1], 1, FALSE, $report); } $ret = 1;
do you want a patch ?
Updated by Philipp Gampe almost 14 years ago
it works partly ... but there can be more than a 301.
http://en.wikipedia.org/wiki/List_of_HTTP_status_codes#3xx_Redirection
I use the following links now for testing:
http://google.com/test
http://www.google.com/test/
http://error.philippgampe.info/
http://typo3.org/404/
http://www.philippgampe.info/
http://typo3.org/
http://loop.philippgampe.info/
for RTE:
<p><a href="http://google.com/test" class="external-link-new-window" title="Opens external link in new window" external="1">http://google.com/test</a></p> <p><a href="http://www.google.com/test/" class="external-link-new-window" title="Opens external link in new window" external="1">http://www.google.com/test/</a></p> <p><a href="http://error.philippgampe.info" external="1">http://error.philippgampe.info/</a></p> <p><a href="http://typo3.org/404/" class="external-link-new-window" title="Opens external link in new window" external="1">http://typo3.org/404/</a></p> <p><a href="http://www.philippgampe.info/" class="external-link-new-window" title="Opens external link in new window" external="1">http://www.philippgampe.info/</a></p> <p><a href="http://typo3.org/" class="external-link-new-window" title="Opens external link in new window" external="1">http://typo3.org/</a></p> <p><a href="http://loop.philippgampe.info" external="1">http://loop.philippgampe.info/</a></p>
I get 3 broken links:
http://google.com/test --> 301 zu http://www.google.com/test --> 404
http://www.google.com/test --> 404
http://typo3.org/404/ --> 404
what still does not work:
http://loop.philippgampe.info/ --> 302 http://loop.philippgampe.info/
http://error.philippgampe.info --> 404 ?!?
Please see my patch which does at least work for the google 301. (Warning, it may have whitespace issues!)
Best regards
Phil
Updated by Chris topher almost 14 years ago
- % Done changed from 100 to 70
Hi Philipp,
thanks for the patch and for your explanation!
Philipp Gampe wrote:
what still does not work:
http://loop.philippgampe.info/ --> 302 http://loop.philippgampe.info/
This is a loop, which redirects to the same location again.
If I read your patch correctly, you should be able to recognize this, when you check inside of the
if (isset($location[1])) { ...}
if the new Location is the same as the old one. If so, we have such a loop.
http://error.philippgampe.info --> 404 ?!?
According to Firebug this page is delivered with statuscode 404.
Seems like this 404 is not recognized correctly by t3lib_div::getUrl. (Or are we using $reports incorrectly?)
Updated by Philipp Gampe almost 14 years ago
Christopher wrote:
This is a loop, which redirects to the same location again.
If I read your patch correctly, you should be able to recognize this, when you check inside of the
[...]
if the new Location is the same as the old one. If so, we have such a loop.
sure it was set up to test this :) BUT there might be more trickier loops which might not be catched by checking the location again. Those must be catched later.
http://error.philippgampe.info --> 404 ?!?
According to Firebug this page is delivered with statuscode 404.
Seems like this 404 is not recognized correctly by t3lib_div::getUrl. (Or are we using $reports incorrectly?)
Not that I would be aware of... I will play around a bit more and try to debug the whole think. Including the answers - I am online in IRC #typo3 until late tonight.
BTW ... shouldn't be the last check check for >300 instead of >400? IMHO only links with 2xx should be considered not broken.
Another side note ... did anyone test this with curl option on?
Best regards
Phil
Updated by Philipp Gampe almost 14 years ago
- reordered the status code parsing
- changed everything >300 to error
- changed second parameter of getUrl to 2 (headers only)
Updated by Philipp Gampe almost 14 years ago
- File test-links.t3d test-links.t3d added
here is my test content element
Updated by Philipp Gampe almost 14 years ago
- change one typo succes --> success
- adds a new label for redirect loop (will show HTTP code and redirect Location:)
- will loop for HTTP codes 301, 302, 303 and 307
- loop will run max 4 times (that means 5 tries in total)
- every status code >= 300 will be detected as brocken link
- code was restructured such that unspecific is checked first and more specific checks are done after this. This will make it easier to check for other "special" codes later too.
- it will use t3lib_div::getURL with 2 as second parameter (2 = headers only, no content) as we do not need the content
Additionally I changed the strict comparison (===) into the less strict variant (==) as I had a lot of troubles. This is no user input, so it should be rather save.
Updated by Chris topher almost 14 years ago
- Status changed from Accepted to Resolved
- Assignee set to Chris topher
- % Done changed from 70 to 100
linkvalidator uses t3lib_div::getURL to get information about external URLs. This function sometimes returns wrong status codes. But these are no errors, which linkvalidator has to fix. They must be fixed in the Core.
See #24464 and #24483 for details.
Philipp, thanks for your continuous work on this problem!
Followup committed in r41763.
Updated by Chris topher over 12 years ago
- Status changed from Resolved to Closed
Updated by Michael Stucki almost 11 years ago
- Project changed from 1510 to TYPO3 Core
- Category changed from Linkvalidator to Linkvalidator