Bug #97937
openLinkvalidator: Links and in tt_content.bodytext cause problems in UrlSoftReferenceParser
0%
Description
In our test case, we have a content element with CType = textmedia and the following data in the bodytext field:
<p>lorem ipsum https://weber.digital dolor sit amet</p>
<p><strong>Used Fonts and Iconfonts</strong><br /> Museo Sans Rounded Family OTF (<a class="external" href="https://www.myfonts.com">https://www.myfonts.com</a>)<br /> Droid Serif (<a class="external" href="https://www.fontsquirrel.com/fonts/droid-serif">https://www.fontsquirrel.com/fonts/droid-serif</a>)<br /> Icons Mind (<a class="external" href="https://iconsmind.com">https://iconsmind.com</a>)<br /> Linear Icons Free (<a class="external" href="https://linearicons.com">https://linearicons.com</a>) (CC BY-SA 4.0) by Perxis (<a class="external" href="https://perxis.com">https://perxis.com</a>)</p>
When the linkvalidator executes the UrlSoftReferenceParser, it finds two broken links:
- https://weber.digital
</a>)<br /> Linear Icons Free (<a class="external" href="https://linearicons.com"
So according to this test, the UrlSoftReferenceParser does not handle correctly and also has a bug in the big regex, which does all the parsing.
Updated by Gerrit Code Review about 2 years ago
- Status changed from New to Under Review
Patch set 1 for branch main of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/c/Packages/TYPO3.CMS/+/75200
Updated by Gerrit Code Review about 2 years ago
Patch set 2 for branch main of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/c/Packages/TYPO3.CMS/+/75200
Updated by Gerrit Code Review about 2 years ago
Patch set 3 for branch main of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/c/Packages/TYPO3.CMS/+/75200
Updated by Gerrit Code Review about 2 years ago
Patch set 4 for branch main of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/c/Packages/TYPO3.CMS/+/75200
Updated by Gerrit Code Review about 2 years ago
Patch set 5 for branch main of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/c/Packages/TYPO3.CMS/+/75200
Updated by Gerrit Code Review about 2 years ago
Patch set 6 for branch main of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/c/Packages/TYPO3.CMS/+/75200
Updated by Gerrit Code Review about 2 years ago
Patch set 7 for branch main of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/c/Packages/TYPO3.CMS/+/75200
Updated by Gerrit Code Review about 2 years ago
Patch set 8 for branch main of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/c/Packages/TYPO3.CMS/+/75200
Updated by Nikita Hovratov about 2 years ago
- Related to Bug #98120: Link parsing problem in linkvalidator added
Updated by Gerrit Code Review about 2 years ago
Patch set 9 for branch main of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/c/Packages/TYPO3.CMS/+/75200
Updated by Sybille Peters almost 2 years ago
- Related to Bug #98328: Exception "Data too long for column url" when checking links added
Updated by Sybille Peters almost 2 years ago
Has anyone analyzed this further what actually causes the problem, what is affected and what is not and would work as a workaround ?
I have tested the following and it works - meaning the linkvalidator BE module will now show the broken links, but some of my workarounds are a bit brutal, so you should make sure, it is what you want.
Analysis¶
The affected texts mostly contained an URL as anchor text. When this was changed, the problem no longer existed.
Possible work-arounds¶
Solution 3 is the least invasive, it only affects link checking, the other 2 also change the content / behaviour of TYPO3 in general.
1. If possible, do not use URLs as anchor texts. This has some disadavantages in general, so it might be a good idea in general:
- accessibility / screen readers
- Usability, SEO: Is the URL really what you want to show the users (in some cases it may be), but often it is better to embed a link in a sentence or word(s) organically which results in text as anchor text, not URLs.
As makeshift solution, the URL in the anchor text can be converted to an URL without scheme, e.g.
https://example.org => example.org
2. Do not parse the rich texts with softref "url".
This can be disabled entirely ( !!!which will also result in URLs that are displayed as links no longer being displayed as URLs, so make sure this is what you want!!! ).
e.g.
Configuration/TCA/Overrides/tt_content.php
$GLOBALS['TCA']['tt_content']['columns']['bodytext']['config']['softref'] = preg_replace('#(,|^)url(,|$)#', '',
$GLOBALS['TCA']['tt_content']['columns']['bodytext']['config']['softref']);
which results in:
- softref = rtehtmlarea_images,typolink_tag,email[subst],url + softref = rtehtmlarea_images,typolink_tag,email[subst]
3. Or it can be disabled only when parsing with linkvalidator (and leaving the rest intact) by patching linkvalidator/Classes/LinkAnalyzer::analyzeRecord(). But this way it will only check links, not standalone URLs.
if (!($conf['softref'] ?? false) || (string)$valueField === '') {
continue;
}
+ $conf['softref'] = preg_replace('#(,|^)url(,|$)#', '', $conf['softref']);
Updated by Gerrit Code Review over 1 year ago
Patch set 10 for branch main of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/c/Packages/TYPO3.CMS/+/75200
Updated by Gerrit Code Review over 1 year ago
Patch set 11 for branch main of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/c/Packages/TYPO3.CMS/+/75200
Updated by Gerrit Code Review about 1 year ago
Patch set 12 for branch main of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/c/Packages/TYPO3.CMS/+/75200
Updated by Gerrit Code Review about 1 year ago
Patch set 13 for branch main of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/c/Packages/TYPO3.CMS/+/75200
Updated by Sybille Peters about 1 year ago
- Related to Bug #99909: False positive broken links by parsing URLs not inside <a> tags added
Updated by Gerrit Code Review 10 months ago
Patch set 14 for branch main of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/c/Packages/TYPO3.CMS/+/75200
Updated by Gerrit Code Review 10 months ago
Patch set 15 for branch main of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/c/Packages/TYPO3.CMS/+/75200
Updated by Gerrit Code Review 8 months ago
Patch set 16 for branch main of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/c/Packages/TYPO3.CMS/+/75200
Updated by Gerrit Code Review 8 months ago
Patch set 17 for branch main of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/c/Packages/TYPO3.CMS/+/75200
Updated by Ralf Hettinger 6 months ago
- Related to Epic #85006: Reduce falsely reported broken links added