Actions
Bug #34774
closedRemoveXSS doesn't work correctly on 16-bit HTML-entities
Status:
Rejected
Priority:
Should have
Assignee:
-
Category:
Miscellaneous
Target version:
-
Start date:
2012-03-12
Due date:
% Done:
0%
Estimated time:
TYPO3 Version:
4.5
PHP Version:
5.3
Tags:
Complexity:
easy
Is Regression:
No
Sprint Focus:
Description
With the 16-bit utf-8 encoding of entities like [“”…–→]
(codes “ ” – … →)and probably tons of other entities, the output of t3lib_div::removeXSS() becomes garbled: R20;, R21;, R11;, R30;, U94; and so on. I get those entities from importing RSS-feeds e. g. of Wordpress.
A possible solution would be to replace the hex-values with their according HTML-entities before testing and replacing the numeric values in RemoveXSS::process(). There may be better and faster solutions.
final class RemoveXSS { ... public static function process($val, $replaceString = '<x>') { ... // remove all non-printable characters. CR(0a) and LF(0b) and TAB(9) are allowed // this prevents some character re-spacing such as <java\0script> // note that you have to handle splits with \n, \r, and \t later since they *are* allowed in some inputs $val = preg_replace('/([\x00-\x08][\x0b-\x0c][\x0e-\x19])/', '', $val); //------patch begin------- $search = array(); $replace = array(); $search[] = '/–/'; $replace[] = '–'; $search[] = '/“/'; $replace[] = '“'; $search[] = '/”/'; $replace[] = '”'; $search[] = '/…/'; $replace[] = '…'; $search[] = '/→/'; $replace[] = '→'; // ... plus several other entities $val = preg_replace( $search, $replace, $val ); //------patch end------- // straight replacements, the user should never need these since they're normal characters // this prevents like <IMG SRC=@avascript:alert('XSS')>
I'm not sure if 16-bit values have to be replaced in RemoveXSS::process() at all. If they don't have to be, they should be skipped completely when analyzing the numeric entities. In this case the questionmark in the search expressions can be deleted (...7c|7d|7e);?/ie';) so that only values less equal to 0x7E are matched:
$searchHexEncodings = '/&#[xX]0{0,8}(21|22|23|...|7d|7e);/ie'; $searchUnicodeEncodings = '/�{0,8}(33|34|35|...|125|126);/ie';
This suggestion should be checked by some experts on XSS.
Actions