Filtering Out Cross-Site Scripting Attacks in Domino
The other day I saw a signature in a forum that said something along the lines of:
Just because you're not paranoid it doesn't mean somebody isn't out to get you.
No matter how desirable or what the outcome is of hacking your site, somebody will probably have a go at doing so. Even if there's nothing in it for them financially it can often be the challenge. Sometimes it might simply be to inform you of your site's shortcomings. Whatever the intention and no matter how important you think it is for your site or your customer's application to be completely secure it's well worth getting in there first and saving face and sleepless nights.
One major area of concern is Cross-Site Scripting (XSS) — users injecting nasty bits of executable code in to the content of a site.
XSS is of particular concern for any site where the user actually creates the content, as is often the case with Domino applications, and when there's little or no approval process. If you have a document with a rich text field on it (or any other fields for that matter) then you need the Form's WQS agent to make sure the content is safe to display to other unsuspecting users.
Stripping all HTML out of plain text fields is easy enough - a simple @ReplaceSubstring in the Input Translation does the trick. It gets trickier when you want to allow HTML in a rich text field (through the use of WYSIWYG editors) but want to filter out unwanted HTML.
So, I'm looking to see how people do this. What's the best way to strip all the nastiness out of a large rich text field and what exactly do we need to strip? In short — how do we avoid XSS issues with user-driven Domino content? Is a simple LotusScript Replace() call enough? If so what are the arrays of strings to look for?
Whatever I learn here I'll put in to a downloadable NSF for all to use. I'll also put it online so that the wannabe hackers amongst us can take their best shot at it and we can make sure we've got something truly secure.
Jake,
Interesting initiative. My approach is to "disable, unless". I essentially block all html tags except a few that I specifiy explicitly, i.e. p,u,li,b,i. This seems obvious, but I have seen designs that do things the other way around: filtering everything potentially dangerous, leading to large exclusion lists.
Furthermore, it is of course important to disable script tags, along with every other syntax that allows the insertion of javascript, Flash, and the like.
A final thing which comes to mind, that I have not checked yet, is to see if other representations of a tag are filtered correctly. For example, the script tag can be written as <script>, but also using escape chars, Unicode chars, or combinations of the above. Will your filter script block all variations?
Oh, and do run the filter on all sorts of input, not just fields. Do not forget your querystrings for example.
Not a perfect fit example, but hopefully some useful pointers?
PS: In both PHP and .NET there are ready-made/built-in protection methods for these kind of things. Not to dismiss Domino, just to provide information :)
That's what I was thinking Ferdy (also wishing Domino had this built in!) but I am stuck knowing where to start as I don't know the best methods to apply the rules. Can LotusScript's replace do it or do we need more powerful string manipulation methods in a Java agent instead I wonder...
Remember that you need to strip javascript from an elements event attributes too. e.g. in p tags onmouseover attribute.
Also some browsers allow img tags src attribute to contain javascript.
First thing would be to convert all "&" to "& amp;", "<" to "& lt ;" to ">" and so on.
Have a document which will have the list of all allowable tags. Comeback and make another replace on the allowable tags. Only, instead of looking for tags look for a sequence of "& lt ; p & gt ;" for the paragraph tag.
This should probably cover all malicious scripts.
I don't see why Lotusscript can't handle this.
Remember, attackers can also insert the malicious script using hex. The Wikipedia entry on this topic is pretty comprehensive.
{Link}
To do the filtering in LotusScript is definitely possible, but the approach would need to be layered.
My only issue like this was that automated porn spammers kept making forum posts with links to their sites in them. I solved that problem by adding a "quarantine" field to the forum form, and having a WQS agent set the field to "True" if the string "http:" was found. Then I changed the view selection properties to exclude any documents with that field set to true. Finally, I set up a new view that ONLY showed documents flagged as having links, and set up a way for reviewers to clear the quarantine field if the document was legitimate. So documents without links showed up right away, legitimate documents showed up after a reviewer could look them over, and spam got flushed without the public seeing it at all.
I did something similar for documents with script tags in them, but in practice, no one has ever attempted to include script in a forum post.
@Jake
If I were you, I'd contact Steve Castledine or Declan Lynch to see how they are handling it in their blogging templates.
@Sean Peters,
I had a similar problem with the discussion forum that was put on a couple of websites - the owners insisted on not having people logging in .. I did warn them of spam. Before you know it, the spammers found the forums and spammed it all day long ...
So, being a sneaky bloke, I added some spam detection code ...
if content contains bb-code, spam_value = 99
if content has one word, spam_value = 98
if telephone has a value, but no numbers, spam_value = 97
Suprisingly, the above three rules have eliminated all spam ...
When it comes to small fields, @ReplaceSubString is just fine. If you use a WYSIWYG editor on the web, any decent implementation should let you determine what HTML you allow. For example, TINYMCE has an init parameter called "invalid_elements" in which you can define a list of html tags to disallow -- there is also a parameter for "whitelisting" some tags too.
With regards using vanilla Domino, what we really need is a decent regex implementation in Lotusscript or formula. Failing that, @ReplaceSubString doesn't do a bad job of scaling, and people like Mr. Robichaux have already done some exhaustive tests of the Java equivalents (if you don't want to use regex).
Yeah, I might take a look at the blog templates and see how they do it. Although I'd be surprised if the comments area on them is a rich text field. And there's no need to do it on the blog documents as you wouldn't expect the blog owner to hack themselves.
Tufty. The editors do allow you to strip out nasty HTML, but any junior hacker would know how to turn this off, if they didn't just disable the editor in the first place. The filtering has to happen server-side.
Jake
I would do a 2 level approach:
- run what comes back through jTidy (wrap it into Ls2J): result: parsable HTML.
Use a XSLT transformation to wipe out what you don't want. Got some code somewhere...
Ping me if u want it
:-) stw
in addition to Stephan's post you could also use the LS NotesSAXParser parser to process all of the tags and attribute values, rather than XSLT. It might be better for larger XML documents than @replacesubstring.
I've posted a lss of a NotesSAXParser class on my website (link above) for those interested.
Re server-side filtering, no argument from me, but I always do both sides. To address Jake's other concern around JS, we don't allow content to be submitted if it isn't coming via the editor. This check isn't bulletproof, but it discourages the less persistent kiddies, and the server validation / translation sorts out the rest.
(We don't deal with submissions so big that simple formula text manipulation can't deal with them).
Download the domino wiki template from the open NTF and see how does he handling it.
As mention in previous comment here, he is stripping everything and then check for allowed tags to re-add.
nice and simple.
It looks like the Wiki template uses a normal text field, rather than RT. Maybe there's something in there that will help, but I'm guessing the methods of applying the rules will be different.
Jake
Some tips on :
{Link}
There are lots of ways to encode a < character (see below) - which means there are a similar number of ways to encode _every_character. Also - some browsers accept keywords like Javascript when they contain other characters such as line feed. As you noted in another thread // will be interpreted as {Link} in some circumstances. So pattern matching can be very, very difficult
At the same time I don't want to limit the vast majority of our users who just want to provide interesting looking material, including embedding movies & flash on occasion.
Something I've been doing recently is aggressive removal of suspicious characters from short text fields and forcing documents with suspicious textarea fields to remain in draft mode until they're no longer suspicious - obviously that only works if the application has a draft mode for submissions. The checking is done in formula language in field translation events.
There is one thing that's very important to watch out for but I don't really want to explain that in detail here!
Codes for < - not sure what this will look like when submitted...
"<": "<": "%3C": "<": "<": "<": "<": "<": "<": "<": "<": "<": "<": "<": "<": "<": "<": "<": "<": "<": "<": "<": "<": "<": "<": "<": "<": "<": "<": "<": "<": "<": "<": "<": "<": "<": "<": "<": "<": "<": "<": "<": "<": "<": "<": "<": "<": "<": "<": "<": "<": "<": "<": "<": "<": "<": "<": "<": "<": "<": "<": "<": "<": "<": "x3c": "x3C": "u003c": "u003C"
Ha - thought so. (replace @ with &)
"@lt": "@LT": "%3C": "@LT;": "@#60": "@#060": "@#0060": "@#00060": "@#000060": "@#0000060": "@#60;": "@#060;": "@#0060;": "@#00060;": "@#000060;": "@#0000060;": "@#x3c": "@#x03c": "@#x003c": "@#x0003c": "@#x00003c": "@#x000003c": "@#x3c;": "@#x03c;": "@#x003c;": "@#x0003c;": "@#x00003c;": "@#x000003c;": "@#X3c": "@#X03c": "@#X003c": "@#X0003c": "@#X00003c": "@#X000003c": "@#X3c;": "@#X03c;": "@#X003c;": "@#X0003c;": "@#X00003c;": "@#X000003c;": "@#x3C": "@#x03C": "@#x003C": "@#x0003C": "@#x00003C": "@#x000003C": "@#x3C;": "@#x03C;": "@#x003C;": "@#x0003C;": "@#x00003C;": "@#x000003C;": "@#X3C": "@#X03C": "@#X003C": "@#X0003C": "@#X00003C": "@#X000003C": "@#X3C;": "@#X03C;": "@#X003C;": "@#X0003C;": "@#X00003C;": "@#X000003C;": "x3c": "x3C": "u003c": "u003C"
Domino web access has a built in parser of HTML that I think is used to sanitise HTML emails before they are displayed in DWA. It would be fantastic to be able to use that for web applications too - anyone know how?
Other then using UnformattedText and not getitemvalue in your WQS, What is difference between text and richtext?
I knew DWA did something like this, as I'd found an article on a security advisor website talking about a vulnerability in DWA and the subsequent fix.
I don't know much about DWA though (is that what used to be iNotes?). Is it hackable - can we get to the code?
Jake
The DWA code is hackable to an extent. I have modified the DWA template in the past and have been successful to an extent.
Good Luck hacking !!!