Project

General

Profile

Actions

Bug #10907

closed

invalid external url not detected

Added by Philipp Gampe about 14 years ago. Updated almost 11 years ago.

Status:
Closed
Priority:
Must have
Assignee:
Category:
Linkvalidator
Target version:
-
Start date:
2010-11-17
Due date:
% Done:

100%

Estimated time:
TYPO3 Version:
PHP Version:
Tags:
Complexity:
Is Regression:
Sprint Focus:

Description

First, completely invalid urls are detected, e.g. crap.google.com

However 404 and 403 are not caught, e.g.:

http://error.philippgampe.info
http://philippgampe.info/fileadmin/
http://google.com/test

Those links are broken too and should be detected.

Applying the patch from #10589 does not help.


Files

linkvalidator-redirect.diff (2.27 KB) linkvalidator-redirect.diff Philipp Gampe, 2011-01-02 01:43
linkval-non200_debug.diff (2.53 KB) linkval-non200_debug.diff Philipp Gampe, 2011-01-04 00:13
test-links.t3d (2.42 KB) test-links.t3d content element with links Philipp Gampe, 2011-01-04 00:58
redirect-detection-working.diff (3.51 KB) redirect-detection-working.diff working patch, might have whitespace issue ... please double check Philipp Gampe, 2011-01-04 01:51
Actions #1

Updated by Steffen Kamper almost 14 years ago

please answer

Actions #2

Updated by Pierre Boivin almost 14 years ago

  • Status changed from New to Needs Feedback
  • % Done changed from 0 to 100

Corrected by adding lines 62 to 66 of "class.tx_linkvalidator_linktypes_external.php"

Actions #3

Updated by Pierre Boivin almost 14 years ago

  • Status changed from Needs Feedback to Resolved
Actions #4

Updated by Philipp Gampe almost 14 years ago

Sorry for warming this up, but some are still not catched.

http://philippgampe.info/fileadmin/ (403) is detected.

http://crap.google.com/ (network error) is detected.

But http://error.philippgampe.info (404) is not detected.
This one is not market as broken either:
http://typo3.org/404
Nor
http://google.com/test/

Best regards
Phil

Actions #5

Updated by Patrick Gaumond almost 14 years ago

  • Status changed from Resolved to Accepted
Actions #6

Updated by Mehdi Guermazi almost 14 years ago

I tested on a trunk 4.5 beta2 with the latest trunk and everything working fine.

I tried:
http://error.philippgampe.info/ External Link returned HTTP error code (404)
http://www.google.com/test External Link returned HTTP error code (404)
http://www.google.com/test External Link returned HTTP error code (404)

the three links where detected as 404...

Actions #7

Updated by Simon Ouellet almost 14 years ago

worksforme :)

tested with 4.4.5 trunk and linkvalidator trunk...

Actions #8

Updated by Philipp Gampe almost 14 years ago

ok, tracked it down to redirecting...
http://google.com/test is not detected, but
http://www.google.com/test is.

This might be a core thing :(

Env:
Ubuntu 10.04 with lampp (XAMPP) TYPO3 4.5trunk + linkvalidator trunk

Actions #9

Updated by Simon Ouellet almost 14 years ago

ok...

a possible workaround could be to call t3lib_div::getUrl while the returned header = 301

class.tx_linkvalidator_linktypes_external.php (line 55)

        $content = t3lib_div::getURL($url, 1, FALSE, $report);

        while ($report['http_code'] == '301') {
            $isCodeRedirect = preg_match('/Location: (.*)/', $content, $location);
            if (isset($location[1])) {
                $content = t3lib_div::getURL($location[1], 1, FALSE, $report);
            }
        }

        $ret = 1;

or just test it one time :

        $content = t3lib_div::getURL($url, 1, FALSE, $report);

        $isCodeRedirect = preg_match('/Location: (.*)/', $content, $location);
        if (isset($location[1])) {
            $content = t3lib_div::getURL($location[1], 1, FALSE, $report);
        }

        $ret = 1;

do you want a patch ?

Actions #10

Updated by Philipp Gampe almost 14 years ago

it works partly ... but there can be more than a 301.
http://en.wikipedia.org/wiki/List_of_HTTP_status_codes#3xx_Redirection

I use the following links now for testing:
http://google.com/test
http://www.google.com/test/
http://error.philippgampe.info/
http://typo3.org/404/
http://www.philippgampe.info/
http://typo3.org/
http://loop.philippgampe.info/

for RTE:

<p><a href="http://google.com/test" class="external-link-new-window" title="Opens external link in new window" external="1">http://google.com/test</a></p>
<p><a href="http://www.google.com/test/" class="external-link-new-window" title="Opens external link in new window" external="1">http://www.google.com/test/</a></p>
<p><a href="http://error.philippgampe.info" external="1">http://error.philippgampe.info/</a></p>
<p><a href="http://typo3.org/404/" class="external-link-new-window" title="Opens external link in new window" external="1">http://typo3.org/404/</a></p>
<p><a href="http://www.philippgampe.info/" class="external-link-new-window" title="Opens external link in new window" external="1">http://www.philippgampe.info/</a></p>
<p><a href="http://typo3.org/" class="external-link-new-window" title="Opens external link in new window" external="1">http://typo3.org/</a></p>
<p><a href="http://loop.philippgampe.info" external="1">http://loop.philippgampe.info/</a></p>

I get 3 broken links:
http://google.com/test --> 301 zu http://www.google.com/test --> 404
http://www.google.com/test --> 404
http://typo3.org/404/ --> 404

what still does not work:
http://loop.philippgampe.info/ --> 302 http://loop.philippgampe.info/
http://error.philippgampe.info --> 404 ?!?

Please see my patch which does at least work for the google 301. (Warning, it may have whitespace issues!)

Best regards
Phil

Actions #11

Updated by Chris topher almost 14 years ago

  • % Done changed from 100 to 70

Hi Philipp,

thanks for the patch and for your explanation!

Philipp Gampe wrote:

what still does not work:
http://loop.philippgampe.info/ --> 302 http://loop.philippgampe.info/

This is a loop, which redirects to the same location again.
If I read your patch correctly, you should be able to recognize this, when you check inside of the

if (isset($location[1])) { ...}

if the new Location is the same as the old one. If so, we have such a loop.

http://error.philippgampe.info --> 404 ?!?

According to Firebug this page is delivered with statuscode 404.
Seems like this 404 is not recognized correctly by t3lib_div::getUrl. (Or are we using $reports incorrectly?)

Actions #12

Updated by Philipp Gampe almost 14 years ago

Christopher wrote:

This is a loop, which redirects to the same location again.
If I read your patch correctly, you should be able to recognize this, when you check inside of the
[...]
if the new Location is the same as the old one. If so, we have such a loop.

sure it was set up to test this :) BUT there might be more trickier loops which might not be catched by checking the location again. Those must be catched later.

http://error.philippgampe.info --> 404 ?!?

According to Firebug this page is delivered with statuscode 404.
Seems like this 404 is not recognized correctly by t3lib_div::getUrl. (Or are we using $reports incorrectly?)

Not that I would be aware of... I will play around a bit more and try to debug the whole think. Including the answers - I am online in IRC #typo3 until late tonight.

BTW ... shouldn't be the last check check for >300 instead of >400? IMHO only links with 2xx should be considered not broken.

Another side note ... did anyone test this with curl option on?

Best regards
Phil

Actions #13

Updated by Philipp Gampe almost 14 years ago

Attached a diff with my changes + some debugging code
  • reordered the status code parsing
  • changed everything >300 to error
  • changed second parameter of getUrl to 2 (headers only)
Actions #14

Updated by Philipp Gampe almost 14 years ago

here is my test content element

Actions #15

Updated by Philipp Gampe almost 14 years ago

this hopefully final patch does the following:
  • change one typo succes --> success
  • adds a new label for redirect loop (will show HTTP code and redirect Location:)
  • will loop for HTTP codes 301, 302, 303 and 307
    • loop will run max 4 times (that means 5 tries in total)
  • every status code >= 300 will be detected as brocken link
  • code was restructured such that unspecific is checked first and more specific checks are done after this. This will make it easier to check for other "special" codes later too.
  • it will use t3lib_div::getURL with 2 as second parameter (2 = headers only, no content) as we do not need the content

Additionally I changed the strict comparison (===) into the less strict variant (==) as I had a lot of troubles. This is no user input, so it should be rather save.

Actions #16

Updated by Chris topher almost 14 years ago

  • Status changed from Accepted to Resolved
  • Assignee set to Chris topher
  • % Done changed from 70 to 100

linkvalidator uses t3lib_div::getURL to get information about external URLs. This function sometimes returns wrong status codes. But these are no errors, which linkvalidator has to fix. They must be fixed in the Core.
See #24464 and #24483 for details.

Philipp, thanks for your continuous work on this problem!

Followup committed in r41763.

Actions #17

Updated by Chris topher over 12 years ago

  • Status changed from Resolved to Closed
Actions #18

Updated by Michael Stucki almost 11 years ago

  • Category set to Linkvalidator
Actions #19

Updated by Michael Stucki almost 11 years ago

  • Project changed from 1510 to TYPO3 Core
  • Category changed from Linkvalidator to Linkvalidator
Actions

Also available in: Atom PDF