Bug #89682
closedLinkvalidator: external URLs containing `& amp ;` or whitespace at the end not working
100%
Description
When inserting external links via CKEditor, ampersands (&
) are converted to & amp ;
in the source code. That's fine for usage in the frontend, but the link validator seems to have problems with this as it tries to verify the link containing the html-entity-version of the ampersand (which doesn't work).
Links ending with whitespace (e. g. <a href="https://www.typo3.org/ ">link</a>
) also work in the frontend but not with the link validator.
The attached patch file should take care of both issues.
Files
Updated by Jonas Eberle over 5 years ago
While & (HTML) should be converted to & for the URL whitespaces as in your example are not supposed to be in the frontend. If that works it is pure luck. They would be encoded to %20 and become part of the path.
So I think a `html_entity_decode()` between fetching from HTML and further processing should be used.
Updated by Sybille Peters over 5 years ago
- Related to Bug #89488: HTML special characters fool linkvalidator added
Updated by Sybille Peters over 5 years ago
Thanks for pointing out the issue!
I think the html_entity_decode() looks like a good solution.
The problem is, TYPO3 gives us (as result from the linkref functions) the URL as should be used in the BE form fields. So it is encoded with &
If an URL that ends with (unencoded) whitespace valid? I think not. I agree with Jonas here. Actually I think these things should be validated much earlier, preferably in the link wizard. Adding a trim() now would actually mask the problem that this is an invalid URL which should have been invalidated or sanitized earlier.
Whitespaces should be encoded in the URL, as stated above. Whitespaces will currently already cause problems - independent of linkvalidator - if you enter URLs with whitespaces unencoded in the link wizard, but that is something that probably should be handled in the linkwizard itself.
Examples:
- this is not a valid URL: "https://example.org/path with spaces?id=1&id2=2"
- nor is this: "https://example.org/ "
- this is: "https://example.org/path%20with%20spaces?id=1&id2=2"
You can use PHP filter_var
e.g.
if (filter_var($url, FILTER_VALIDATE_URL) === false) {
print("not a valid URL");
}
see https://stackoverflow.com/questions/2058578/best-way-to-check-if-a-url-is-valid
Updated by Gerrit Code Review over 5 years ago
- Status changed from New to Under Review
Patch set 1 for branch master of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/c/Packages/TYPO3.CMS/+/62634
Updated by Gerrit Code Review over 5 years ago
Patch set 1 for branch 9.5 of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/c/Packages/TYPO3.CMS/+/62645
Updated by Sybille Peters over 5 years ago
- Status changed from Under Review to Resolved
- % Done changed from 0 to 100
Applied in changeset 44df5456ecee3e2afae620b0721db68d02b2e341.