Bug #19110

t3lib_div::removeXSS translates normal text too

Added by David Slayback over 12 years ago. Updated over 10 years ago.

Should have
Target version:
Start date:
Due date:
% Done:


Estimated time:
TYPO3 Version:
PHP Version:
Is Regression:
Sprint Focus:


If a user types in something in a reply form and the t3lib_div::removeXSS function is called on it, it inserts <x> into normal text too. This make it unusable. It should only deal with keywords in HTML tags, not normal text.

Here is an example:
the title of the scripture is linked to a metaphysical layer of baseless logic

Which translates to:
the ti<x>tle of the sc<x>ripture is li<x>nked to a me<x>taphysical la<x>yer of ba<x>seless logic

(issue imported from #M8978)


removeXSS.diff (1.04 KB) removeXSS.diff Administrator Admin, 2008-09-29 13:26
removeXSS-1.diff (11.4 KB) removeXSS-1.diff Administrator Admin, 2008-10-30 19:44

Related issues

Related to TYPO3 Core - Feature #19600: Improvement of removeXSSClosedOliver Hader2008-11-12

Has duplicate TYPO3 Core - Bug #19234: removeXSS needs improvementClosedMichael Stucki2008-08-20


Updated by Paolo over 12 years ago

I can confirm this is affecting th_mailformplus extension, too


Updated by Cyrill Helg over 12 years ago

Maybe this function could be used:

 * Wrapper for the RemoveXSS function.
 * Removes potential XSS code from an input string.
 * Using an external class by Travis Puderbaugh &lt;&gt;
 * @param    string        Input string
 * @return    string        Input string with potential XSS code removed
function RemoveXSS($val) {
// remove all non-printable characters. CR(0a) and LF(0b) and TAB(9) are allowed
// this prevents some character re-spacing such as &lt;java\0script&gt;
// note that you have to handle splits with \n, \r, and \t later since they *are
allowed in some inputs
$val = preg_replace('/([\x00-\x08][\x0b-\x0c][\x0e-\x20])/', '', $val);
// straight replacements, the user should never need these since they're normal characters
// this prevents like &lt;IMG SRC=&#X40&#X61&#X76&#X61&#X73&#X63&#X72&#X69&#X70&#X74&#X3A&#X61&#X6C&#X65&#X72&#X74&#X28&#X27&#X58&#X53&#X53&#X27&#X29&gt;
$search = 'abcdefghijklmnopqrstuvwxyz';
$search.= '1234567890!@#$%^&*()';
$search.= '~`";:?+/={}[]-_|\'\\';
for ($i = 0; $i < strlen($search); $i++) {
// ;? matches the ;, which is optional
// 0{0,7} matches any padded zeros, which are optional and go up to 8 chars
// &#x0040  search for the hex values
$val = preg_replace('/(&#[x|X]0{0,8}'.dechex(ord($search[$i])).';?)/i', $search[$i], $val); // with a ;
// &#00064
0{0,7} matches '0' zero to seven times
$val = preg_replace('/(&#0{0,8}'.ord($search[$i]).';?)/', $search[$i], $val); // with a ;
// now the only remaining whitespace attacks are \t, \n, and \r
$ra1 = array('javascript', 'vbscript', 'expression', 'applet', 'meta', 'xml', 'blink', 'link', 'style', 'script', 'embed', 'object', 'iframe', 'frame', 'frameset', 'ilayer', 'layer', 'bgsound', 'title', 'base');
$ra2 = array('onabort', 'onactivate', 'onafterprint', 'onafterupdate', 'onbeforeactivate', 'onbeforecopy', 'onbeforecut', 'onbeforedeactivate', 'onbeforeeditfocus', 'onbeforepaste', 'onbeforeprint', 'onbeforeunload', 'onbeforeupdate', 'onblur', 'onbounce', 'oncellchange', 'onchange', 'onclick', 'oncontextmenu', 'oncontrolselect', 'oncopy', 'oncut', 'ondataavailable', 'ondatasetchanged', 'ondatasetcomplete', 'ondblclick', 'ondeactivate', 'ondrag', 'ondragend', 'ondragenter', 'ondragleave', 'ondragover', 'ondragstart', 'ondrop', 'onerror', 'onerrorupdate', 'onfilterchange', 'onfinish', 'onfocus', 'onfocusin', 'onfocusout', 'onhelp', 'onkeydown', 'onkeypress', 'onkeyup', 'onlayoutcomplete', 'onload', 'onlosecapture', 'onmousedown', 'onmouseenter', 'onmouseleave', 'onmousemove', 'onmouseout', 'onmouseover', 'onmouseup', 'onmousewheel', 'onmove', 'onmoveend', 'onmovestart', 'onpaste', 'onpropertychange', 'onreadystatechange', 'onreset', 'onresize', 'onresizeend', 'onresizestart', 'onrowenter', 'onrowexit', 'onrowsdelete', 'onrowsinserted', 'onscroll', 'onselect', 'onselectionchange', 'onselectstart', 'onstart', 'onstop', 'onsubmit', 'onunload');
$ra = array_merge($ra1, $ra2);
$found = true; // keep replacing as long as the previous round replaced something
while ($found == true) {
$val_before = $val;
for ($i = 0; $i < sizeof($ra); $i++) {
$pattern = '/';
for ($j = 0; $j < strlen($ra[$i]); $j++) {
if ($j > 0) {
$pattern .= '(';
$pattern .= '(&#[x|X]0{0,8}([9][a][b]);?)?';
$pattern .= '|(&#0{0,8}([9][10][13]);?)?';
$pattern .= ')?';
$pattern .= $ra[$i][$j];
$pattern .= '/i';
$replacement = substr($ra[$i], 0, 2).'&lt;x&gt;'.substr($ra[$i], 2); // add in <> to nerf the tag
$val = preg_replace($pattern, $replacement, $val); // filter out the hex tags
if ($val_before == $val) {
// no replacements were made, so exit the loop
$found = false;
return $val;

Updated by David Slayback over 12 years ago


This is very close to the exact function already used in the Core in t3lib_div::removeXSS(). It does handle the XSS issues well, but it does not consider whether the tags are in text or in tags <>. I have searched for updates on this function and have not found any. I wrote Travis (creator of this function) to see if he knows of any updates.



Updated by Cyrill Helg over 12 years ago

great, I'm interested in a better solution aswell.


Updated by Christopher Schnell over 12 years ago

This is related to 0007033 but since removeXSS is now in the core, it is good to have it here. Please note also the bug mentioned above. The problem lies in the tags-array. The function does not check, if it is really a tag like <base=...> but only if some text including base does appear.

This means, we never can write something with the swiss city "Basel" when removeXSS is called, because it would be displayed as Ba<x>sel.

Unfortuantely, I am not good with regular expressions, so I myself cannot provide a patch for this issue. Anyone with good RegEx skills?


Updated by Joachim Rinck over 12 years ago

you could do something like this:

66 $pattern = '/<(.*)';
67 for ($j = 0; $j < strlen($ra[$i]); $j++) {
68 if ($j > 0) {
69 $pattern .= '(';
70 $pattern .= '(&#[x|X]0{0,8}([9][a][b]);?)?';
71 $pattern .= '|(&#0{0,8}([9][10][13]);?)?';
72 $pattern .= ')?';
73 }
74 $pattern .= $ra[$i][$j];
75 }
76 $pattern .= '(.*)>/i';

that would for example match "<>" but not "". still, i do think there should be better solutions for the same problem. i particularily don't like the loop in a loop in a loop approach to something that seems to be job for one good regex.

could anyone post infos or a good link to see what exactly should be matched and what not?


Updated by Jigal van Hemert about 12 years ago

is a pretty good list of what is filtered by removeXSS()

Joachim's suggestion will not work, because * is greedy, so it would take the opening "<" of any previous (valid) tag (and the closing ">" of any following tag) as a match.
Using something like <([^>])*? to find the closest opening of a tag and to make sure you're "inside" a tag will not work because browsers accept ">" inside an attribute value (see section "XSS using HTML quote encapsulation:" in the url I mentioned above)

The main problem with removeXSS() is that it assumes that the output will be used inside normal HTML content. The <x> it inserts is ignored by all browsers AFAIK and will not be visible, but it will still break the XSS attempt.

If the output is used in RSS, a textarea, etc. or if it's processed by htmlspecialchars() the <x> will be visible.

Yesterday I submitted a new version of removeXSS() to the TYPO3 dev list.
- a lot faster
- a few extra XSS attacks filtered
- less false positives through stricter filtering (filters are adjusted according to the type of the keyword)
- a few minor bugs in regular expressions fixed


Updated by Steffen Kamper about 12 years ago

Attached is a diff with new version of removeXSS which should solve all 3 issues


Updated by David Slayback about 12 years ago

I attachad a corrected diff (removeXSS-1.diff) with a fix. Commas were being removed because of:

$val = preg_replace('/([\x00-\x08,\x0b-\x0c,\x0e-\x19])/', '', $val);

It was changed to:

$val = preg_replace('/([\x00-\x08][\x0b-\x0c][\x0e-\x19])/', '', $val);


Updated by Oliver Hader about 12 years ago

Committed to SVN Trunk (rev. 4457)

Also available in: Atom PDF