Bug #105753
closedLinkvalidator does not parse domains with Umlauts correctly
100%
Description
Steps to reproduce:
- Create "Link to External Url" page
- Insert Url with Umlaut ("https://gebäudehülle.swiss")
- Run Linkvalidator Task for the created page
It will find the link "https://geb" and report it as broken.
The reason is the Regexp in UrlSoftReferenceParser
does not match Umlauts.
Files
Updated by Sybille Peters 14 days ago
- File linkvalidator_result.png linkvalidator_result.png added
- Status changed from New to Accepted
I can reproduce this in v14 (main) and v12 (latest 12.4 branch using page type "link to external page" and the link in pages.url (as described).
I cannot reproduce it with content element CType="textmedia" and links in tt_content.bodytext (RTE). There, the links are parsed correctly.
Updated by Sybille Peters 14 days ago
The following works correctly in RTE:
select bodytext from tt_content where pid=913;
<p>mit Umlauten <a href="https://xn--gebudehlle-s5a60a.swiss/">link (umlauts)</a></p>
<p><a href="https://xn--gebudehlle-s5a60a.swiss/sdfsddfs">broken link (umlauts)</a></p>
<p>Encoded: <a href="https://gebäudehülle.swiss">link (encoded)</a></p>
<p><a href="https://gebäudehülle.swiss/sdfsddfs">broken link (encoded)</a></p>
Updated by Sybille Peters 14 days ago · Edited
For pages.url, if the already encoded value is used, it is ok:
Not ok, if this is used:
As a workaround you can use the encoded URL (the URL is already converted for me, when I copy it from the browser address bar, for example).
As mentioned above, the problem is in UrlSoftReferenceParser. Possible solutions:
- value is already converted when saved in pages.url (and other fields)
- linkvalidator does not use softref parser to "parse" fields where the content can be used as is (e.g. for pages.url which is of type "input" with "softref" = "url")
- linkvalidator converts the URL before passing it to the sofref parser (not possible for fields where parsing should be performed FIRST)
- the softref parser UrlSoftReferenceParser converts the URL first. We already have code for this in ExternalLinktype::preprocessUrl :
protected function preprocessUrl(string $url): string
{
$url = html_entity_decode($url);
$parts = parse_url($url);
if ($parts['host'] ?? false) {
try {
$newDomain = (string)idn_to_ascii($parts['host']);
if (strcmp($parts['host'], $newDomain) !== 0) {
$parts['host'] = $newDomain;
$url = HttpUtility::buildUrl($parts);
}
} catch (\Exception | \Throwable $e) {
// ignore error and proceed with link checking
}
}
return $url;
}
Updated by Sybille Peters 14 days ago
Also, there is no problem with parsing sys_file_reference.link because it uses the TypolinkSoftReferenceParser.
I am wondering what is the point of having UrlSoftReferenceParser because TypolinkSoftReferenceParser can also parse urls.
Also, pages.url could use type=link and with allowedTypes='url' (though that would not solve the problem with possibly other fields)
pages.url:
- type=input
- softref=url
sys_file_reference.link:
- type=link
- softref=typolink
Updated by Gerrit Code Review 12 days ago
- Status changed from Accepted to Under Review
Patch set 1 for branch main of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/c/Packages/TYPO3.CMS/+/87609
Updated by Gerrit Code Review 11 days ago
Patch set 2 for branch main of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/c/Packages/TYPO3.CMS/+/87609
Updated by Gerrit Code Review 11 days ago
Patch set 3 for branch main of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/c/Packages/TYPO3.CMS/+/87609
Updated by Gerrit Code Review 11 days ago
Patch set 1 for branch 13.4 of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/c/Packages/TYPO3.CMS/+/87618
Updated by Gerrit Code Review 11 days ago
Patch set 1 for branch 12.4 of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/c/Packages/TYPO3.CMS/+/87619
Updated by Sybille Peters 11 days ago
- Status changed from Under Review to Resolved
- % Done changed from 0 to 100
Applied in changeset 22265db3b933a7d2e40625a1f65e0eec73cbc2dd.