To summarize: The problem only occurs if a link to a content element is used in header_link, not in bodytext?
I can reproduce this.
I think I should be able to provide a patch. Here is some more information:
About debugging: you ideally debug LinkAnalyzer::analyzeRecord when checking links, e.g. by putting a breakpoint there with condition, e.g. $table == 'tt_content' or putting the breakpoint on line 337.
I also checked the database first because I wanted to make sure the problem is in the checking:
select url,table_name,field from tx_linkvalidator_link where record_pid=663;
+-------------+------------+-------------+
| url | table_name | field |
+-------------+------------+-------------+
| 101999 | tt_content | header_link |
| 664#c101999 | tt_content | bodytext |
+-------------+------------+-------------+
This confirms: It is already wrong when written to the DB, the URL should be the same.
The problem seems to be that for bodytext TypolinkTagSoftReferenceParser is used and for header_link TypolinkSoftReferenceParser. Both (correctly) return the same result, but in line 337 of LinkAnalyzer there is an if switch:
if ($softReferenceParser->getParserKey() === 'typolink_tag') {
$this->analyzeTypoLinks($parserResult, $results, $htmlParser, $record, $field, $table);
} else {
$this->analyzeLinks($parserResult, $results, $record, $field, $table);
}
So for header_link analyzeLinks is called and for bodytext analyzeTypoLinks - even though it is also a typolink. Where in the one case pageAndAnchor is written and in the other not.
And you can see in line 214 what happens:
if (!empty($entryValue['pageAndAnchor'] ?? '')) {
// Page with anchor, e.g. 18#1580
$url = $entryValue['pageAndAnchor'];
} else {
$url = $entryValue['substr']['tokenValue'];
}
I did not thoroughly debug that part, but pretty confident that is the problem.
This code has not been changed much for years. I will also check other versions 11 and 12 to check if new problem or already existed.