Epic #85006: Reduce falsely reported broken links
linkvalidator: Add possibility to exclude specific external URLs / domains or patterns
Sometimes there may be problems with external sites that linkvalidator / Guzzle doesn't handle correctly or which can't be handled correctly at all.
To make linkvalidator still usable, it would be good to be able to exclude certain domains / urls using regular expressions. This should be supplied in a way that an admin user can edit it.
Prerequisite: Provide general configuration for extension. This does not need to be made configurable per page tree.
URLs that work in browser, but check will fail:
#1 Updated by Sybille Peters about 1 year ago
- Subject changed from Add configuration option: file with regex patterns to exclude for external link checking to Add possibility to exclude specific external URLs / domains or patterns
In some rare cases, the checking of external URLs fails via linkvalidator even if they are not broken. See also #86918. This leads to false negatives (URLs reported as broken which are not).
One of the most annoying things for editors when working through the list of broken links are false negatives which keep coming up, clutter up the list, cannot be removed and make it really tedious to actually work through the list and fix broken links.
It would be ideal to optimize the crawl process so it always correctly reports broken links but this may not be entirely possible.
So as alternative we could add a mechanism to exclude specific URLs or URL patterns, e.g.
- exact URL
- URL starting with ... (or domain)
- regular expression
This could be done by just adding optional files or could be done in the GUI with an "ignore" button.