Project

General

Profile

Actions

Bug #97937

open

Linkvalidator: Links and   in tt_content.bodytext cause problems in UrlSoftReferenceParser

Added by Kai Strecker almost 2 years ago. Updated 3 months ago.

Status:
Under Review
Priority:
Should have
Assignee:
-
Category:
Linkvalidator
Target version:
-
Start date:
2022-07-14
Due date:
% Done:

0%

Estimated time:
TYPO3 Version:
11
PHP Version:
8.1
Tags:
Complexity:
Is Regression:
Sprint Focus:

Description

In our test case, we have a content element with CType = textmedia and the following data in the bodytext field:

<p>lorem ipsum https://weber.digital&nbsp; &nbsp; dolor sit amet</p>
<p><strong>Used Fonts and Iconfonts</strong><br /> Museo Sans Rounded Family OTF (<a class="external" href="https://www.myfonts.com">https://www.myfonts.com</a>)<br /> Droid Serif (<a class="external" href="https://www.fontsquirrel.com/fonts/droid-serif">https://www.fontsquirrel.com/fonts/droid-serif</a>)<br /> Icons Mind (<a class="external" href="https://iconsmind.com">https://iconsmind.com</a>)<br /> Linear Icons Free (<a class="external" href="https://linearicons.com">https://linearicons.com</a>) (CC BY-SA 4.0) by Perxis (<a class="external" href="https://perxis.com">https://perxis.com</a>)</p>
When the linkvalidator executes the UrlSoftReferenceParser, it finds two broken links:

So according to this test, the UrlSoftReferenceParser does not handle &nbsp; correctly and also has a bug in the big regex, which does all the parsing.


Related issues 4 (3 open1 closed)

Related to TYPO3 Core - Bug #98120: Link parsing problem in linkvalidator Closed2022-08-10

Actions
Related to TYPO3 Core - Bug #98328: Exception "Data too long for column url" when checking links New2022-09-11

Actions
Related to TYPO3 Core - Bug #99909: False positive broken links by parsing URLs not inside <a> tagsNeeds Feedback2023-02-09

Actions
Related to TYPO3 Core - Epic #85006: Reduce falsely reported broken linksNew2018-02-11

Actions
Actions #1

Updated by Gerrit Code Review almost 2 years ago

  • Status changed from New to Under Review

Patch set 1 for branch main of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/c/Packages/TYPO3.CMS/+/75200

Actions #2

Updated by Gerrit Code Review almost 2 years ago

Patch set 2 for branch main of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/c/Packages/TYPO3.CMS/+/75200

Actions #3

Updated by Gerrit Code Review almost 2 years ago

Patch set 3 for branch main of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/c/Packages/TYPO3.CMS/+/75200

Actions #4

Updated by Gerrit Code Review almost 2 years ago

Patch set 4 for branch main of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/c/Packages/TYPO3.CMS/+/75200

Actions #5

Updated by Gerrit Code Review almost 2 years ago

Patch set 5 for branch main of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/c/Packages/TYPO3.CMS/+/75200

Actions #6

Updated by Gerrit Code Review almost 2 years ago

Patch set 6 for branch main of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/c/Packages/TYPO3.CMS/+/75200

Actions #7

Updated by Gerrit Code Review over 1 year ago

Patch set 7 for branch main of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/c/Packages/TYPO3.CMS/+/75200

Actions #8

Updated by Gerrit Code Review over 1 year ago

Patch set 8 for branch main of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/c/Packages/TYPO3.CMS/+/75200

Actions #9

Updated by Nikita Hovratov over 1 year ago

  • Related to Bug #98120: Link parsing problem in linkvalidator added
Actions #10

Updated by Gerrit Code Review over 1 year ago

Patch set 9 for branch main of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/c/Packages/TYPO3.CMS/+/75200

Actions #11

Updated by Sybille Peters over 1 year ago

  • Related to Bug #98328: Exception "Data too long for column url" when checking links added
Actions #12

Updated by Sybille Peters over 1 year ago

Has anyone analyzed this further what actually causes the problem, what is affected and what is not and would work as a workaround ?

I have tested the following and it works - meaning the linkvalidator BE module will now show the broken links, but some of my workarounds are a bit brutal, so you should make sure, it is what you want.

Analysis

The affected texts mostly contained an URL as anchor text. When this was changed, the problem no longer existed.

Possible work-arounds

Solution 3 is the least invasive, it only affects link checking, the other 2 also change the content / behaviour of TYPO3 in general.

1. If possible, do not use URLs as anchor texts. This has some disadavantages in general, so it might be a good idea in general:

  • accessibility / screen readers
  • Usability, SEO: Is the URL really what you want to show the users (in some cases it may be), but often it is better to embed a link in a sentence or word(s) organically which results in text as anchor text, not URLs.

As makeshift solution, the URL in the anchor text can be converted to an URL without scheme, e.g.

https://example.org => example.org

2. Do not parse the rich texts with softref "url".

This can be disabled entirely ( !!!which will also result in URLs that are displayed as links no longer being displayed as URLs, so make sure this is what you want!!! ).

e.g.

Configuration/TCA/Overrides/tt_content.php

$GLOBALS['TCA']['tt_content']['columns']['bodytext']['config']['softref'] = preg_replace('#(,|^)url(,|$)#', '',
$GLOBALS['TCA']['tt_content']['columns']['bodytext']['config']['softref']);

which results in:

- softref = rtehtmlarea_images,typolink_tag,email[subst],url
+ softref = rtehtmlarea_images,typolink_tag,email[subst]

3. Or it can be disabled only when parsing with linkvalidator (and leaving the rest intact) by patching linkvalidator/Classes/LinkAnalyzer::analyzeRecord(). But this way it will only check links, not standalone URLs.

if (!($conf['softref'] ?? false) || (string)$valueField === '') {
  continue;
}
+ $conf['softref'] = preg_replace('#(,|^)url(,|$)#', '', $conf['softref']);
Actions #13

Updated by Gerrit Code Review 12 months ago

Patch set 10 for branch main of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/c/Packages/TYPO3.CMS/+/75200

Actions #14

Updated by Gerrit Code Review 12 months ago

Patch set 11 for branch main of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/c/Packages/TYPO3.CMS/+/75200

Actions #15

Updated by Gerrit Code Review 10 months ago

Patch set 12 for branch main of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/c/Packages/TYPO3.CMS/+/75200

Actions #16

Updated by Gerrit Code Review 9 months ago

Patch set 13 for branch main of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/c/Packages/TYPO3.CMS/+/75200

Actions #17

Updated by Sybille Peters 9 months ago

  • Related to Bug #99909: False positive broken links by parsing URLs not inside <a> tags added
Actions #18

Updated by Gerrit Code Review 5 months ago

Patch set 14 for branch main of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/c/Packages/TYPO3.CMS/+/75200

Actions #19

Updated by Gerrit Code Review 5 months ago

Patch set 15 for branch main of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/c/Packages/TYPO3.CMS/+/75200

Actions #20

Updated by Gerrit Code Review 4 months ago

Patch set 16 for branch main of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/c/Packages/TYPO3.CMS/+/75200

Actions #21

Updated by Gerrit Code Review 3 months ago

Patch set 17 for branch main of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/c/Packages/TYPO3.CMS/+/75200

Actions #22

Updated by Ralf Hettinger about 1 month ago

  • Related to Epic #85006: Reduce falsely reported broken links added
Actions

Also available in: Atom PDF