Bug #19110
closedt3lib_div::removeXSS translates normal text too
Added by David Slayback over 16 years ago. Updated over 14 years ago.
0%
Description
If a user types in something in a reply form and the t3lib_div::removeXSS function is called on it, it inserts <x> into normal text too. This make it unusable. It should only deal with keywords in HTML tags, not normal text.
Here is an example:
the title of the scripture is linked to a metaphysical layer of baseless logic
Which translates to:
the ti<x>tle of the sc<x>ripture is li<x>nked to a me<x>taphysical la<x>yer of ba<x>seless logic
(issue imported from #M8978)
Files
removeXSS.diff (1.04 KB) removeXSS.diff | Administrator Admin, 2008-09-29 13:26 | ||
removeXSS-1.diff (11.4 KB) removeXSS-1.diff | Administrator Admin, 2008-10-30 19:44 |
Updated by Paolo over 16 years ago
I can confirm this is affecting th_mailformplus extension, too
Updated by Cyrill Helg over 16 years ago
Maybe this function could be used:
/**
* Wrapper for the RemoveXSS function.
* Removes potential XSS code from an input string.
*
* Using an external class by Travis Puderbaugh <kallahar@quickwired.com>
*
* @param string Input string
* @return string Input string with potential XSS code removed
/
function RemoveXSS($val) {
// remove all non-printable characters. CR(0a) and LF(0b) and TAB(9) are allowed
// this prevents some character re-spacing such as <java\0script>
// note that you have to handle splits with \n, \r, and \t later since they *are allowed in some inputs
$val = preg_replace('/([\x00-\x08][\x0b-\x0c][\x0e-\x20])/', '', $val);
// straight replacements, the user should never need these since they're normal characters
// this prevents like <IMG SRC=@avascript:alert('XSS')>
$search = 'abcdefghijklmnopqrstuvwxyz';
$search.= 'ABCDEFGHIJKLMNOPQRSTUVWXYZ';
$search.= '1234567890!@#$%^&*()';
$search.= '~`";:?+/={}[]-_|\'\\';
for ($i = 0; $i < strlen($search); $i++) {
// ;? matches the ;, which is optional
// 0{0,7} matches any padded zeros, which are optional and go up to 8 chars
// @ search for the hex values
$val = preg_replace('/(&#[x|X]0{0,8}'.dechex(ord($search[$i])).';?)/i', $search[$i], $val); // with a ;
// @
0{0,7} matches '0' zero to seven times
$val = preg_replace('/(�{0,8}'.ord($search[$i]).';?)/', $search[$i], $val); // with a ;
}
// now the only remaining whitespace attacks are \t, \n, and \r
$ra1 = array('javascript', 'vbscript', 'expression', 'applet', 'meta', 'xml', 'blink', 'link', 'style', 'script', 'embed', 'object', 'iframe', 'frame', 'frameset', 'ilayer', 'layer', 'bgsound', 'title', 'base');
$ra2 = array('onabort', 'onactivate', 'onafterprint', 'onafterupdate', 'onbeforeactivate', 'onbeforecopy', 'onbeforecut', 'onbeforedeactivate', 'onbeforeeditfocus', 'onbeforepaste', 'onbeforeprint', 'onbeforeunload', 'onbeforeupdate', 'onblur', 'onbounce', 'oncellchange', 'onchange', 'onclick', 'oncontextmenu', 'oncontrolselect', 'oncopy', 'oncut', 'ondataavailable', 'ondatasetchanged', 'ondatasetcomplete', 'ondblclick', 'ondeactivate', 'ondrag', 'ondragend', 'ondragenter', 'ondragleave', 'ondragover', 'ondragstart', 'ondrop', 'onerror', 'onerrorupdate', 'onfilterchange', 'onfinish', 'onfocus', 'onfocusin', 'onfocusout', 'onhelp', 'onkeydown', 'onkeypress', 'onkeyup', 'onlayoutcomplete', 'onload', 'onlosecapture', 'onmousedown', 'onmouseenter', 'onmouseleave', 'onmousemove', 'onmouseout', 'onmouseover', 'onmouseup', 'onmousewheel', 'onmove', 'onmoveend', 'onmovestart', 'onpaste', 'onpropertychange', 'onreadystatechange', 'onreset', 'onresize', 'onresizeend', 'onresizestart', 'onrowenter', 'onrowexit', 'onrowsdelete', 'onrowsinserted', 'onscroll', 'onselect', 'onselectionchange', 'onselectstart', 'onstart', 'onstop', 'onsubmit', 'onunload');
$ra = array_merge($ra1, $ra2);
$found = true; // keep replacing as long as the previous round replaced something
while ($found == true) {
$val_before = $val;
for ($i = 0; $i < sizeof($ra); $i++) {
$pattern = '/';
for ($j = 0; $j < strlen($ra[$i]); $j++) {
if ($j > 0) {
$pattern .= '(';
$pattern .= '(&#[x|X]0{0,8}([9][a][b]);?)?';
$pattern .= '|(�{0,8}([9][10][13]);?)?';
$pattern .= ')?';
}
$pattern .= $ra[$i][$j];
}
$pattern .= '/i';
$replacement = substr($ra[$i], 0, 2).'<x>'.substr($ra[$i], 2); // add in <> to nerf the tag
$val = preg_replace($pattern, $replacement, $val); // filter out the hex tags
if ($val_before == $val) {
// no replacements were made, so exit the loop
$found = false;
}
}
}
return $val;
}
Updated by David Slayback over 16 years ago
Cyrill,
This is very close to the exact function already used in the Core in t3lib_div::removeXSS(). It does handle the XSS issues well, but it does not consider whether the tags are in text or in tags <>. I have searched for updates on this function and have not found any. I wrote Travis (creator of this function) to see if he knows of any updates.
-Dave
Updated by Cyrill Helg over 16 years ago
great, I'm interested in a better solution aswell.
Updated by Christopher Schnell over 16 years ago
This is related to 0007033 but since removeXSS is now in the core, it is good to have it here. Please note also the bug mentioned above. The problem lies in the tags-array. The function does not check, if it is really a tag like <base=...> but only if some text including base does appear.
This means, we never can write something with the swiss city "Basel" when removeXSS is called, because it would be displayed as Ba<x>sel.
Unfortuantely, I am not good with regular expressions, so I myself cannot provide a patch for this issue. Anyone with good RegEx skills?
Updated by Joachim Rinck over 16 years ago
you could do something like this:
66 $pattern = '/<(.*)';
67 for ($j = 0; $j < strlen($ra[$i]); $j++) {
68 if ($j > 0) {
69 $pattern .= '(';
70 $pattern .= '(&#[x|X]0{0,8}([9][a][b]);?)?';
71 $pattern .= '|(�{0,8}([9]1013);?)?';
72 $pattern .= ')?';
73 }
74 $pattern .= $ra[$i][$j];
75 }
76 $pattern .= '(.*)>/i';
that would for example match "<info@basel.ch>" but not "info@basel.ch". still, i do think there should be better solutions for the same problem. i particularily don't like the loop in a loop in a loop approach to something that seems to be job for one good regex.
could anyone post infos or a good link to see what exactly should be matched and what not?
Updated by Jigal van Hemert about 16 years ago
http://ha.ckers.org/xss.html
is a pretty good list of what is filtered by removeXSS()
Joachim's suggestion will not work, because * is greedy, so it would take the opening "<" of any previous (valid) tag (and the closing ">" of any following tag) as a match.
Using something like <([^>])*? to find the closest opening of a tag and to make sure you're "inside" a tag will not work because browsers accept ">" inside an attribute value (see section "XSS using HTML quote encapsulation:" in the url I mentioned above)
The main problem with removeXSS() is that it assumes that the output will be used inside normal HTML content. The <x> it inserts is ignored by all browsers AFAIK and will not be visible, but it will still break the XSS attempt.
If the output is used in RSS, a textarea, etc. or if it's processed by htmlspecialchars() the <x> will be visible.
Yesterday I submitted a new version of removeXSS() to the TYPO3 dev list.
Improvements:
- a lot faster
- a few extra XSS attacks filtered
- less false positives through stricter filtering (filters are adjusted according to the type of the keyword)
- a few minor bugs in regular expressions fixed
Updated by Steffen Kamper about 16 years ago
Attached is a diff with new version of removeXSS which should solve all 3 issues
Updated by David Slayback about 16 years ago
I attachad a corrected diff (removeXSS-1.diff) with a fix. Commas were being removed because of:
$val = preg_replace('/([\x00-\x08,\x0b-\x0c,\x0e-\x19])/', '', $val);
It was changed to:
$val = preg_replace('/([\x00-\x08][\x0b-\x0c][\x0e-\x19])/', '', $val);