Project

General

Profile

Actions

Feature #85127

open

Epic #85006: Reduce falsely reported broken links

linkvalidator: Add possibility to exclude specific external URLs / domains or patterns

Added by Sybille Peters over 6 years ago. Updated over 1 year ago.

Status:
New
Priority:
Should have
Assignee:
-
Category:
Linkvalidator
Target version:
-
Start date:
2018-05-31
Due date:
% Done:

0%

Estimated time:
PHP Version:
Tags:
Complexity:
Sprint Focus:

Description

Sometimes there may be problems with external sites that linkvalidator / Guzzle doesn't handle correctly or which can't be handled correctly at all.

To make linkvalidator still usable, it would be good to be able to exclude certain domains / urls using regular expressions. This should be supplied in a way that an admin user can edit it.

Prerequisite: Provide general configuration for extension. This does not need to be made configurable per page tree.

Examples

URLs that work in browser, but check will fail:


Related issues 3 (1 open2 closed)

Related to TYPO3 Core - Bug #86918: Linkvalidator stops working on specific links (external URLs)Closed2018-11-13

Actions
Related to TYPO3 Core - Feature #89457: Add possibility to mark as error specific external URLs / domains or patternsClosed2019-10-18

Actions
Related to TYPO3 Core - Bug #99909: False positive broken links by parsing URLs not inside <a> tagsNeeds Feedback2023-02-09

Actions
Actions #1

Updated by Sybille Peters almost 5 years ago

  • Subject changed from Add configuration option: file with regex patterns to exclude for external link checking to Add possibility to exclude specific external URLs / domains or patterns

Problem

In some rare cases, the checking of external URLs fails via linkvalidator even if they are not broken. See also #86918. This leads to false negatives (URLs reported as broken which are not).

One of the most annoying things for editors when working through the list of broken links are false negatives which keep coming up, clutter up the list, cannot be removed and make it really tedious to actually work through the list and fix broken links.

Proposed Solution

It would be ideal to optimize the crawl process so it always correctly reports broken links but this may not be entirely possible.

So as alternative we could add a mechanism to exclude specific URLs or URL patterns, e.g.

  • exact URL
  • URL starting with ... (or domain)
  • regular expression

This could be done by just adding optional files or could be done in the GUI with an "ignore" button.

Actions #2

Updated by Sybille Peters almost 5 years ago

  • Related to Bug #86918: Linkvalidator stops working on specific links (external URLs) added
Actions #3

Updated by Lina Wolf almost 5 years ago

  • Related to Feature #89457: Add possibility to mark as error specific external URLs / domains or patterns added
Actions #4

Updated by Sybille Peters over 4 years ago

  • Subject changed from Add possibility to exclude specific external URLs / domains or patterns to linkvalidator: Add possibility to exclude specific external URLs / domains or patterns
Actions #5

Updated by Sybille Peters about 4 years ago

  • Description updated (diff)
Actions #6

Updated by Sybille Peters over 3 years ago

  • Assignee deleted (Sybille Peters)
Actions #7

Updated by Christopher Schnell over 1 year ago

Since this feature request is still open after almnost five years, it seems that there is not much interest in implementing this feature.

However, another common use case, where the described feature could be very useful, is for internal environments. Imagine an intranet in combination with a separated DMS application, a bugtracker or something else, where a lot of links from the TYPO3 intranet point to the other systems and which the linkvalidator alerts as false positive, because it has no access to it, maybe because of SSO.
Instead of getting a lot of false positives, the domains of these external systems could be excluded from the link checking.

Actions #8

Updated by Sybille Peters over 1 year ago

@Christopher There is currently no simple way to do this with Linkvalidator out of the box.

However, Link Validator is very well customizable. You can create your own custom Link type, extending the external link type. Please see the Link Validator documentation" and look in the source code of ExternalLinktype.php

In fact, I would personally recommend to do that and not use the ExternalLinktype without customization anyway, for the following reasons:

  1. problem of false positives: Links are detected as broken even if they are not. This seems to be a problem which is getting worse. There are a number of reasons for this but the most common seem to be (1) incomplete certificate chain of the server (which is resolved in the browser because the browser fetches intermediate certificates, but curl (which is by default used by Guzzle, which is used by TYPO3) does not and (2) Cloudflare protected sites. If you customize the class, you can define what is checked and what is not and if specific error types or domains should be excluded
  2. if there are lots of external links, it might make sense to implement a crawl delay and make sure external sites are not checked too often

Alternatively, you can look at my extension Broken Link Fixer (brofix), which is based on linkvalidator but has implemented some changes, such as:

- link target cache and crawl delay when fetching to reduce access of external sites
- Button in link list for "ignoring" a target URL by domain or URL (exclude list), exclude list can be maintained in the BE module

You can look at the implementation in "Broken Link Fixer" (brofix) as a start to create your own or you can try out the extension.

Even with the "ignore" button, I have now deactivated the external link checking for our university TYPO3 installation except for a few known internal domains in the university because the "false positives" was too much of a problem.

Actions #9

Updated by Christopher Schnell over 1 year ago

@Sybille Peters wow, just read your brofix extension documentation and this is exactly what I have been having in mind (also having an ignore button in the list). Thank you very much for the hint and your work on that extension.

Can't this get into the core, replacing the Linkvalidator?

Actions #10

Updated by Sybille Peters over 1 year ago

@Christopher I can't make that decision.

I am just a contributor, I am not in core team.

You can contact me on Slack if you have further questions. Anyway, the "ignore button" is just a workaround, it is not a silver bullet and the problem of the "false positives" is quite tricky to solve.

If you want to talk about linkvalidator you can also use the core channel, see contribution guide: https://docs.typo3.org/m/typo3/guide-contributionworkflow/main/en-us/Community/Index.html

Actions #11

Updated by Sybille Peters about 1 year ago

  • Related to Bug #99909: False positive broken links by parsing URLs not inside <a> tags added
Actions

Also available in: Atom PDF