Project

General

Profile

Actions

Bug #34774

closed

RemoveXSS doesn't work correctly on 16-bit HTML-entities

Added by Jan Bartels about 12 years ago. Updated almost 8 years ago.

Status:
Rejected
Priority:
Should have
Assignee:
-
Category:
Miscellaneous
Target version:
-
Start date:
2012-03-12
Due date:
% Done:

0%

Estimated time:
TYPO3 Version:
4.5
PHP Version:
5.3
Tags:
Complexity:
easy
Is Regression:
No
Sprint Focus:

Description

With the 16-bit utf-8 encoding of entities like [“”…–→]

(codes “ ” – … →)
and probably tons of other entities, the output of t3lib_div::removeXSS() becomes garbled: R20;, R21;, R11;, R30;, U94; and so on. I get those entities from importing RSS-feeds e. g. of Wordpress.

A possible solution would be to replace the hex-values with their according HTML-entities before testing and replacing the numeric values in RemoveXSS::process(). There may be better and faster solutions.

final class RemoveXSS {
...
public static function process($val, $replaceString = '<x>') {
...
// remove all non-printable characters. CR(0a) and LF(0b) and TAB(9) are allowed
// this prevents some character re-spacing such as <java\0script>
// note that you have to handle splits with \n, \r, and \t later since they *are* allowed in some inputs
$val = preg_replace('/([\x00-\x08][\x0b-\x0c][\x0e-\x19])/', '', $val);

//------patch begin-------
        $search = array();       $replace = array();
        $search[] = '/&#8211;/'; $replace[] = '&ndash;';
        $search[] = '/&#8220;/'; $replace[] = '&ldquo;';
        $search[] = '/&#8221;/'; $replace[] = '&rdquo;';
        $search[] = '/&#8230;/'; $replace[] = '&hellip;';
        $search[] = '/&#8594;/'; $replace[] = '&rarr;';
//              ... plus several other entities 
        $val = preg_replace( $search, $replace, $val );
//------patch end-------

        // straight replacements, the user should never need these since they're normal characters
        // this prevents like <IMG SRC=&#X40&#X61&#X76&#X61&#X73&#X63&#X72&#X69&#X70&#X74&#X3A&#X61&#X6C&#X65&#X72&#X74&#X28&#X27&#X58&#X53&#X53&#X27&#X29>

I'm not sure if 16-bit values have to be replaced in RemoveXSS::process() at all. If they don't have to be, they should be skipped completely when analyzing the numeric entities. In this case the questionmark in the search expressions can be deleted (...7c|7d|7e);?/ie';) so that only values less equal to 0x7E are matched:

        $searchHexEncodings = '/&#[xX]0{0,8}(21|22|23|...|7d|7e);/ie';
        $searchUnicodeEncodings = '/&#0{0,8}(33|34|35|...|125|126);/ie';

This suggestion should be checked by some experts on XSS.

Actions #1

Updated by Jigal van Hemert about 12 years ago

  • Category set to Miscellaneous
  • Status changed from New to Accepted
  • Complexity set to easy

Because the trailing semi-colon has to be optional, the first part of the unicode entity is seen as lower ascii character. I think it's enough to add an assertion to make sure no valid (hexa)decimal characters follow the entity number.

Actions #2

Updated by Mathias Schreiber over 9 years ago

  • Target version set to 7.2 (Frontend)
  • Is Regression set to No
Actions #3

Updated by Benni Mack almost 9 years ago

  • Target version changed from 7.2 (Frontend) to 7.4 (Backend)
Actions #4

Updated by Susanne Moog over 8 years ago

  • Target version changed from 7.4 (Backend) to 7.5
Actions #5

Updated by Benni Mack over 8 years ago

  • Target version deleted (7.5)
Actions #6

Updated by Helmut Hummel almost 8 years ago

  • Status changed from Accepted to Rejected

We deprecated this functionality in master and recommend to not use it at all.
We better not touch this.

Actions

Also available in: Atom PDF