Bug #34774
closedRemoveXSS doesn't work correctly on 16-bit HTML-entities
0%
Description
With the 16-bit utf-8 encoding of entities like [“”…–→]
(codes “ ” – … →)and probably tons of other entities, the output of t3lib_div::removeXSS() becomes garbled: R20;, R21;, R11;, R30;, U94; and so on. I get those entities from importing RSS-feeds e. g. of Wordpress.
A possible solution would be to replace the hex-values with their according HTML-entities before testing and replacing the numeric values in RemoveXSS::process(). There may be better and faster solutions.
final class RemoveXSS { ... public static function process($val, $replaceString = '<x>') { ... // remove all non-printable characters. CR(0a) and LF(0b) and TAB(9) are allowed // this prevents some character re-spacing such as <java\0script> // note that you have to handle splits with \n, \r, and \t later since they *are* allowed in some inputs $val = preg_replace('/([\x00-\x08][\x0b-\x0c][\x0e-\x19])/', '', $val); //------patch begin------- $search = array(); $replace = array(); $search[] = '/–/'; $replace[] = '–'; $search[] = '/“/'; $replace[] = '“'; $search[] = '/”/'; $replace[] = '”'; $search[] = '/…/'; $replace[] = '…'; $search[] = '/→/'; $replace[] = '→'; // ... plus several other entities $val = preg_replace( $search, $replace, $val ); //------patch end------- // straight replacements, the user should never need these since they're normal characters // this prevents like <IMG SRC=@avascript:alert('XSS')>
I'm not sure if 16-bit values have to be replaced in RemoveXSS::process() at all. If they don't have to be, they should be skipped completely when analyzing the numeric entities. In this case the questionmark in the search expressions can be deleted (...7c|7d|7e);?/ie';) so that only values less equal to 0x7E are matched:
$searchHexEncodings = '/&#[xX]0{0,8}(21|22|23|...|7d|7e);/ie'; $searchUnicodeEncodings = '/�{0,8}(33|34|35|...|125|126);/ie';
This suggestion should be checked by some experts on XSS.
Updated by Jigal van Hemert over 12 years ago
- Category set to Miscellaneous
- Status changed from New to Accepted
- Complexity set to easy
Because the trailing semi-colon has to be optional, the first part of the unicode entity is seen as lower ascii character. I think it's enough to add an assertion to make sure no valid (hexa)decimal characters follow the entity number.
Updated by Mathias Schreiber almost 10 years ago
- Target version set to 7.2 (Frontend)
- Is Regression set to No
Updated by Benni Mack over 9 years ago
- Target version changed from 7.2 (Frontend) to 7.4 (Backend)
Updated by Susanne Moog over 9 years ago
- Target version changed from 7.4 (Backend) to 7.5
Updated by Helmut Hummel over 8 years ago
- Status changed from Accepted to Rejected
We deprecated this functionality in master and recommend to not use it at all.
We better not touch this.