Bug #89488
closedEpic #85006: Reduce falsely reported broken links
HTML special characters fool linkvalidator
100%
Description
I detect a problem with & within a link when using the linkvalidator.
How to reproduce the problem:
- create link to the following page
https://standards.cen.eu/dyn/www/f?p=204:6:0::::FSP_ORG_ID,FSP_LANG_ID:,22&cs=1A3FFBC44FAB6B2A181C9525249C3A829
(this do not contain any &)
- check the link and it will work
- run linkvalidator, this will report an invalid link for:
https://standards.cen.eu/dyn/www/f?p=204:6:0::::FSP_ORG_ID,FSP_LANG_ID:,22&cs=1A3FFBC44FAB6B2A181C9525249C3A829
(the result is correct, as the webserver report 404 when a & instead of & is used)
I could get this working, when I extend linkvalidator/Classes/Linktype/ExternalLinktype.php with htmlspecialchars_decode()
--- ExternalLinktype.php.orig 2019-10-23 17:03:45.000000000 +0200
@ -82,7 +82,7
+++ ExternalLinktype.php 2019-10-23 17:02:57.000000000 +0200@
];
$url = $this->preprocessUrl($origUrl);
if (!empty($url)) {
- $isValidUrl = $this->requestUrl($url, 'HEAD', $options);
+ $isValidUrl = $this->requestUrl(htmlspecialchars_decode($url), 'HEAD', $options);
if (!$isValidUrl) {
// HEAD was not allowed or threw an error, now trying GET
$options['headers']['Range'] = 'bytes=0-4048';
Maybe there's a better solution.