Reduce falsely reported broken links
Falsely reported broken links are currently a main factor that makes link fixing with Linkvalidator tedious and annoying: there is no way to remove them from the list of broken links. When searching for links to fix, you have to check several that are not really an error. Furthermore, these stay in the list while the real broken links will disappear, so after fixing more links the ratio of falsely reported broken links to real broken links worsens.
By "falsely reported broken links" we mean links that Linkvalidator shows as broken but that are either not broken or that cannot be edited by the editor or some other reason why they are either irrelevant or cannot be fixed.
We already have several issues and open patches addressing these issues. This EPIC serves to give an overview.
Main reasons for false broken links¶
- external link checking may fail. This means we will get false negatives links that actually work but are evaluated as "broken" by linkvalidator). We already improved here, but it still may happen. (see #89488, #86918, #85127)
- Some links are not broken, but will not return HTTP Status Code 200. This are for example pages that require a login (403, 401).
- broken links are in some fields that are no longer relevant, e.g. in tt_content.bodytext for content elements that do not use bodytext. This may happen if tt_content.ctype is changed (e.g. to plugin), which may often happen on older sites. (see #89182)
- FIXED: the editor has no permission to edit the field or the record (#84214)
- editing the field has been configured away
- the broken link information is "stale", meaning, the broken link has already been fixed but linkvalidator has not rechecked the field or the record has been deleted (see #89426, #83847)
Ideas for reducing false broken links¶
do not check some external links
- Make it possible to exclude URLs from link checking in the configuration (TSconfig), e.g. URLs starting with http://intranet.mysite.com/
- Make it possible to exclude a specific link from link checking (in the RTE)
- For more ease of use: In the list of broken links: add an action button to click on which will add the URL to the ignore URLs
- ideally, the broken link information should be updated as soon as a record is changed (e.g. broken links in list of broken links removed, as soon as record is deleted), e.g. by using NEW / UPDATE / DELETE events
- alternatively, link checking could be done incrementally and more often, only checking the records that changed (see #92220)
do not check some fields
- only check fields that will be rendered, e.g. not tt_content.bodytext for ctype='plugin', etc. (see #89182)