Project

General

Profile

Actions

Bug #104387

open

Linkvalidator shows wrong url for linked content

Added by Ulrich Mathes 3 days ago. Updated about 7 hours ago.

Status:
New
Priority:
Should have
Assignee:
-
Category:
Linkvalidator
Target version:
-
Start date:
2024-07-12
Due date:
% Done:

0%

Estimated time:
TYPO3 Version:
13
PHP Version:
Tags:
Complexity:
Is Regression:
Sprint Focus:

Description

When Linkvalidator finds a broken link to an content element the url has to be stored as [pageUid]#c[contentElementUid]. In case of bodytext this is the case and tested.

In case of header_link for example only the contentElementUid gets stored as url. The report then shows a link to a page with the uid of the content element which is wrong.


First row shows the incorrect link, content element linked in header_link.
Second row shows the correct link, content element linked in bodytext.

As far as i can see, only fields with softref=typolink_tag handling links to content elements right.

TYPO3 12 and 13 are affected.


Files

clipboard-202407121743-zpghz.png (34.2 KB) clipboard-202407121743-zpghz.png Ulrich Mathes, 2024-07-12 15:43
Actions #1

Updated by Ulrich Mathes 3 days ago

  • Description updated (diff)
Actions #2

Updated by Ulrich Mathes 3 days ago

  • Description updated (diff)
Actions #3

Updated by Garvin Hicking 3 days ago

Reproduced this via:

1. Create a page "Invisible"
2. Create a content element of type "Header Only". Can be empty.
3. Hide that content element.
4. Create a page "Linked".
5. Create a content element of type "Header Only". For the field "Link [header_link]" open the link browser, go to page "Invisible", select hidden content element there.
6. Go to linkvalidator, validate the page "Linked"
7. The resulting output will show the error from the screenshot, the broken link will lead to having the page UID set as the content element UID

I've tried to debug this for more than an hour, and the code is very hard for me to read.

https://github.com/TYPO3/typo3/blob/main/typo3/sysext/linkvalidator/Classes/LinkAnalyzer.php#L370 seems to be missing an array key like

            $results[$type][$table . ':' . $field . ':' . $idRecord . ':' . $reference['tokenID']]['pageAndAnchor'] = $record['pid'];

but I can't see how to construct this to only do that in case of a "t3://page?uid=4711#0815" link.

In https://github.com/TYPO3/typo3/blob/main/typo3/sysext/linkvalidator/Classes/Linktype/InternalLinktype.php#L95 there's already some 't3://' parsing involved, unsure if there a specific key needs to be set.

Ultimately, in https://github.com/TYPO3/typo3/blob/main/typo3/sysext/linkvalidator/Classes/LinkAnalyzer.php#L211 there's:

                if (!empty($entryValue['pageAndAnchor'] ?? '')) {
                    // Page with anchor, e.g. 18#1580
                    $url = $entryValue['pageAndAnchor'];
                } else {
                    $url = $entryValue['substr']['tokenValue'];
                }

where the proper URL would need to be referenced. As noted this "pageAndAnchor" would be a key that could be utilized. Otherwize the `$url` variable here could also be set to something from `$entryValue` but I don't really get where all this is coming from and to.

Jsut leaving this here in case it finds someone with expertise or more willpower than me.

Actions #4

Updated by Sybille Peters about 12 hours ago

To summarize: The problem only occurs if a link to a content element is used in header_link, not in bodytext?

I can reproduce this.

I think I should be able to provide a patch. Here is some more information:

About debugging: you ideally debug LinkAnalyzer::analyzeRecord when checking links, e.g. by putting a breakpoint there with condition, e.g. $table == 'tt_content' or putting the breakpoint on line 337.

I also checked the database first because I wanted to make sure the problem is in the checking:

select url,table_name,field from tx_linkvalidator_link where record_pid=663;
+-------------+------------+-------------+
| url         | table_name | field       |
+-------------+------------+-------------+
| 101999      | tt_content | header_link |
| 664#c101999 | tt_content | bodytext    |
+-------------+------------+-------------+

This confirms: It is already wrong when written to the DB, the URL should be the same.

The problem seems to be that for bodytext TypolinkTagSoftReferenceParser is used and for header_link TypolinkSoftReferenceParser. Both (correctly) return the same result, but in line 337 of LinkAnalyzer there is an if switch:

if ($softReferenceParser->getParserKey() === 'typolink_tag') {
  $this->analyzeTypoLinks($parserResult, $results, $htmlParser, $record, $field, $table);
} else {
  $this->analyzeLinks($parserResult, $results, $record, $field, $table);
}

So for header_link analyzeLinks is called and for bodytext analyzeTypoLinks - even though it is also a typolink. Where in the one case pageAndAnchor is written and in the other not.

And you can see in line 214 what happens:

if (!empty($entryValue['pageAndAnchor'] ?? '')) {
  // Page with anchor, e.g. 18#1580
  $url = $entryValue['pageAndAnchor'];
} else {
  $url = $entryValue['substr']['tokenValue'];
}

I did not thoroughly debug that part, but pretty confident that is the problem.

This code has not been changed much for years. I will also check other versions 11 and 12 to check if new problem or already existed.

Actions #5

Updated by Sybille Peters about 12 hours ago

How to solve this:

First off:

In an ideal world, I would prefer to get an intuitively understandable object from the Softreference parser and maybe we would not need analyzeTypoLinks / analyzeLinks.
Maybe this is already possible or a wip?
So, that would be my preference and it is difficult to maintain this code because - to be honest - I am not sure I fully understood the format of the resulting array.

Until then:

1. First of all, this should have been noticed sooner: we need more Tests
2. possibly it might be a good idea to unify analyzeTypoLinks and analyzeLinks
3. possibly analyzeTypoLinks should be called for typolink as well, not just typolink_tag

Actions #6

Updated by Sybille Peters about 7 hours ago

I could also reproduce this in v11 and v12. A bit surprised, it was not noticed sooner.

Actions

Also available in: Atom PDF