Project

General

Profile

Actions

Bug #105943

open

Single quote encoded as ' in rte for link attributes

Added by Pierrick Caillon about 1 month ago. Updated 27 days ago.

Status:
New
Priority:
Must have
Assignee:
-
Category:
Link Handling & Redirect Handling
Target version:
-
Start date:
2025-01-15
Due date:
% Done:

0%

Estimated time:
TYPO3 Version:
12
PHP Version:
8.2
Tags:
Complexity:
medium
Is Regression:
Sprint Focus:

Description

Context: Richtext editor for bodytext. Rendering with Fluid Styled Content.

Input HTML: <p>This is an external <a href="https://somesite.tld" title="I'm the link">link</a></p>
Typed in RTE with link to using external URL.

Result in database: <p>This is an external <a href="https://somesite.tld" title="I&amp;&apos;m the link">link</a></p>

Expected result (one of):
  • <p>This is an external <a href="https://somesite.tld" title="I&#039;m the link">link</a></p>
  • <p>This is an external <a href="https://somesite.tld" title="I'm the link">link</a></p>
  • <p>This is an external <a href="https://somesite.tld" title="I&apos;m the link">link</a></p>

I observed that TYPO3\CMS\Core\Html\HtmlParser::get_tag_attributes uses htmlspecialchars_decode without flags. Using ENT_QUOTES flag would decode single quote entity (&apos;).
I also observed that TYPO3\CMS\Core\Utility\GeneralUtility::implodeAttributes uses htmlspecialchars without flags.
I also observed that TYPO3\HtmlSanitizer\Serializer\Rules::enc uses htmlspecialchars with flags ENT_HTML5 and ENT_QUOTES resulting in &apos; for single quotes. Used in TYPO3\CMS\Core\Html\RteHtmlParser::htmlSanitize and called from TYPO3\CMS\Core\DataHandling\DataHandler.

Thus when passing the initial value to TYPO3\CMS\Core\HTML\RteHtmlParser::transformTextForPersistence from DataHandler it is transformed to <p>This is an external <a href="https://somesite.tld" title="I&#039;m the link">link</a></p> before the call to htmlSanitize and to <p>This is an external <a href="https://somesite.tld" title="I&apos;m the link">link</a></p> after the call and persisted as is.

When displaying the value in Frontend, the value is passed to parseFunc and finally a call to TYPO3\CMS\Frontend\ContentObject\ContentObjectRenderer::parseFuncInternal is done. This uses TYPO3\CMS\Core\Utility\GeneralUtility::get_tag_attributes which does the same as TYPO3\CMS\Core\Html\HtmlParser::get_tag_attributes without the metadata. Thus &apos; is not decoded and then incorrectly encoded using GeneralUtility::implodeAttributes as &amp;apos; throught TypoLink behaviour in TYPO3\CMS\Frontend\Typolink\LinkFactory::addAdditionalAnchorTagAttributes wich reads parameters from the parseFuncInternal definition of TYPO3\CMS\Frontend\ContentObject\ContentObjectRenderer::parameters.

When displaying the value in Backend editor, the value is passed to TYPO3\CMS\Core\HTML\RteHtmlParser::transformTextForRichTextEditor which makes use of TYPO3\CMS\Core\Html\HtmlParser::get_tag_attributes and TYPO3\CMS\Core\Utility\GeneralUtility::implodeAttributes which does not understand &apos; and encodes it as &amp;apos;.

I would recommend changing the methods get_tag_attributes to decode single quotes.

I have checked the main branch on GitHub for changes in these methods and saw none.

Actions #1

Updated by Garvin Hicking about 1 month ago

  • Category set to Link Handling & Redirect Handling

(Thanks for this detailed report! Will try to see if this is still reproducible in main, as some things in the sanitizing have changed which may affect this. Of course this is something with security relevance so we'll need to add tests for this scenario)

Actions #2

Updated by Pierrick Caillon about 1 month ago

  • Subject changed from Quote encoded as &amp;apos; in rte for link attributes to Single quote encoded as &amp;apos; in rte for link attributes
  • Description updated (diff)

Changed "quote" to "single quote" for correct understanding.

Actions #3

Updated by Oliver Hader 27 days ago

TYPO3\CMS\Core\Html\RteHtmlParser::htmlSanitize is only processed, if the feature flag security.backend.htmlSanitizeRte is enabled (which is still disabled per default, to avoid invalid HTML being sanitized/destroyed in the database).

Actions #4

Updated by Oliver Hader 27 days ago

  • Complexity changed from easy to medium
Actions #5

Updated by Oliver Hader 27 days ago

I changed the complexity from "easy" to "medium", since changing the encoding/decoding behavior or HTML strings bears a high risk of introducing new regressions, or even new security vulnerabilities.

Actions

Also available in: Atom PDF