Project

General

Profile

Actions

Bug #86918

closed

Epic #85006: Reduce falsely reported broken links

Linkvalidator stops working on specific links (external URLs)

Added by Stefan Berger over 5 years ago. Updated over 4 years ago.

Status:
Closed
Priority:
Should have
Assignee:
-
Category:
Linkvalidator
Target version:
-
Start date:
2018-11-13
Due date:
% Done:

100%

Estimated time:
TYPO3 Version:
8
PHP Version:
Tags:
Complexity:
Is Regression:
Sprint Focus:

Description

  • Some sites require specific HTTP headers, which are normally set in browsers.
    E.g. an external link validation for the URL "https://www.dpdhl.com/en.html" never ends and finally breaks the scheduler task.
    The result of some debuggins is the following header default set in \TYPO3\CMS\Linkvalidator\Linktype\ExternalLinktype::checkLink():
    $options = [
        'cookies' => GeneralUtility::makeInstance(CookieJar::class),
        'allow_redirects' => ['strict' => true],
        'headers' => [
            'User-Agent'        => 'TYPO3 linkvalidator',
            'Accept'            => '*/*',
            'Accept-Language'   => '*',
            'Accept-Encoding'   => '*',
            'Connection'        => 'keep-alive',
        ],
    ];
    
  • Also, some sites don't allow HEAD requests and in that cases the defined fallback GET Request in the mentioned method above won't ever used.
    So, it would be great if you could decide, for example by a configuration, if you just want to use a simple GET request.
  • Another point is using the HTTP setting "Range: bytes = 0 – 4048" leads to strange responses in some link cases. A better way would be a possibility to set up this header setting so that it will not always be used.

Related issues 1 (1 open0 closed)

Related to TYPO3 Core - Feature #85127: linkvalidator: Add possibility to exclude specific external URLs / domains or patternsNew2018-05-31

Actions
Actions

Also available in: Atom PDF