-
Notifications
You must be signed in to change notification settings - Fork 727
Add base64 decoding for some rules #369
Comments
This might be tricky depending on what these rules look like. The problem is there is no way to treat a transformation as independent save for breaking out into another rule. By default base64Decode gets kinda touchy because if it's at the beginning then if the code isn't base64 encoded then all the other transofrmations will apply on garbage. If it's at the end and it is base64 encoded something like t:lowercase will completely break the functioning of the request. |
Ack! Creating many rules with base64-encoded variants is nontrivial, so I'm scheduling this for later. |
Outside of the base64-encoding, there might be applications that try to decode input via other methods before they execute the input. Which opens a wide field of combinations etc. I agree with @lifeform, that we should probably postpone this and try and get a clear concept of which encodings affect the security and in which combination. I have also been thinking about a higher PL rule which detects double encodings etc. Close this issue? |
@dune73 Let's keep the issue on the table, just for a later release. Hopefully log analysis will give us a better picture of how encoding is used nowadays. Besides using the transformation, we might use other methods to catch base64 too. For instance, if we look for a short string, we can try to match on the encoded byte strings too. If the transformation turns out to be expensive from a performance perspective, we might also coalesce various regexps together into one big 'base 64 rule'. Do the transform once, check several exploits with an assembled regexp. This however would complicate rule maintenance a bit (violation of DRY). |
I introduced two new labels to tag issues like this. I think this is a necessity. Feel free to edit the labels to your convenience (not sure about the colours). |
@dune73 you're a perfect human being, that is exactly what I was thinking of doing :) |
Thanks. ;) |
Saw these ones today:
This string is base64-encoded |
In fact it's detected by a lot of rules, but not 933150. Namely: 931100, 932110, 933160 in PL1, another 4 with PL4. It really forces us to do multimatch which is a resource-hog. Thought: In PL2, update action on all PL1 rules and add multimatch. This would keep PL1 performing, give us the desired coverage at PL2 and still no need to do strict siblings of all PL1 rules (which would be a pain to do and maintain). |
@dune73 Yeah true that more rules catch it (by accident, though why 931100???). Applications actually decoding base64 will usually be PHP/java/.NET apps, so my idea is we won't need to add this transformation + multimatch to many rules, just the rules that specifically pertain to those platforms, plus some generic ones like LFI/RFI. My gut feeling right now is that it adding this transformation to most rules would not be helpful. But I will keep looking for base64 in my logs! Maybe we should also try to define coding standards for when to apply which transformations - e.g. I don't fully understand on when |
+1 on the coding standard! Otherwise, let's let this sink in for some time, gather infos and experience and we'll find a common ground. |
Found some more injection attempts via base64 encoding against a WordPress plugin. The plugin just attempts to base64-decode POST and unserializes whatever's in it. |
Found in logs: |
is possible to detect base64 encoded payloads by the following process At first using regex can we able to detect base64 data, if possible we can match the base64 data and decode by transformation t:urlDecode,t:base64Decode and chain the process with next rule to match the payload with regex and block it. Is it possible ? give yours ideas folks :) |
It isn't possible as far as i'm aware to detect base64 encoded data using modsec (you could detect it statistically perhaps. Instead we might consider adding the base64 encoded varients of things we check for like ../ :) |
so we can't detect any base64 encode payload attack on mod security? How we can prevent it? |
well the thing about this is that the application logic is written to know that something is base64 encoded. In that sense it is very binary, something is b64 or not and it shouldn't change (often). As a result, per application b64 is easy to deal with. The problem is generalizing this. @lifeforms this might be a good early blog post doing like a LUA based b64 detection script. |
@csanders-git hey it's worked as we talk about previously
Its working :) i did create a regex to match any base64 encoding value from user input, if any input data has base64 encoding it will proceed as chain rule process to check whether this user inputs has any injection payload with regex pattern and then it proceed further rules. Check my Github Repository i have show an demo with explanation how to detect and block base64 encoded injection payloads. |
It looks like you did just as i suggested which is added base64 encoded variants of specific things. This isn't actually generically solving the problem. The same could be accomplished by just assuming everything was base64 encoded and decoding it with t:base64decode. but looks good. |
This brings me back to the topic of one of my favorite ModSec stunts that I have not managed to accomplish. I would like to take all arguments and then apply a transformation and then check for all arguments if the transformation was applied and if we received a useful output. With this we could easily check if a payload was processed by base64 or some other encoding technique. |
@csanders-git hey you didn't get what i am saying? If we consider every user inputs as base64 encoded and decode it use transformation t:base64decode. it will not work out For example It will workout only when we are exactly matching the user inputs is base64 :) For example Hope you get it now |
As i said previously, this is not possible to do generically. You can do things to determine it's NOT base64 - for instance presence of a space or other nonvalid char. You can try and detect the equals sign... this will only work in 2/3's of cases. Otherwise you're left with something like ^[a-zA-Z0-9+/]+={0,2}$ which matches 'hello' and will create massive false positives. The other options are that we do as @dune73 described where we base64 decode and try and add some logic to determine if it was successful or we add individual base64 samples to rules and variants of rules. This is also a slippery slope since there are infinite different ways of encoding but i think that b64 is one worth considering. |
@csanders-git i am just working on securing all vulnerable application like OWASP Mutillidae II,Xtreme Vulnerable Web Application (XVWA), SQLi Labs... to improve my defense level on web app. I just came across the base64 payloads when i am testing SQLi Labs. so only i am working on improving my defense level against all encoding techniques. But as far as tested that regex i used for identifying the base64 its works goods...check the regex whether i am wrong or right? Say your view after testing the regex with base64 encoded payloads.. :) |
@umarfarook882 I like what you're doing and i think it's something that will teach you a lot and your feedback is good. Notice however that this regex is too broad. it triggers on the word test |
@dune73 if the point of the rule is that the underlying logic is going to be detecting it after a base64decode, then booting up the entire regex engine is excessive, it is just doing a small sort which will almost always come back as true. Rather than doing that if the approach is to be more performant it should just always match, decode and then check for maliciousness in a chain. |
Oh, I see. Smart. But how do you carry the variable between the first and the 2nd rule in the chain? MATCHED_VAR will be gone the moment you hit the 2nd rule, won't it? |
@dune73 I imagine it would have about the same efficiency as capture (using setvar that is) |
Setvar brings the alleged performance bottlenecks of opening too many variables that zimmerle claims. I am really not sure you can construct this the way you want and the variables with the right decoding end up in the right position for the 2nd rule. But I am very interested to see an example. |
@dune73 you are correct, you must use a var but atleast you don't have to boot up the regex engine. I think the alternative is to start building base64 versions of strings into existing rules but this is a DEEP rabbit hole |
@dune73 Yes matched _var will gone for next rule. But I am using base64 detection regex in every rule as base to detect and then move further as chain to check respective rule payload regex. :) |
I think I have an idea: How about we run basic decodings like base64decode early in phase 2 and if we have a result worth checking (base64decode successful), then we create a variable like Still, we would benefit enormously from a rule that would be able to check if |
yeah, this isn't a bad idea probably the most efficient use... now the question becomes what is the best way to limit false positives? |
@dune73 nice idea let me try and check how its works with out any false positive :) |
Good plan! |
@dune73 Your idea is working well but i have made a little modification in your idea. So currently i am checking for any kind of false positive :) Let me explain the issue: As our case: So i modified the rule to chain process. i carried every action i.e base64 regex pattern matching & base64 decoding action in each chain process rule. So my final idea:
I use TX:/args_encoded.*/ in all rules where we needed to check encoded payloads :) Thank you for your idea @dune73 why we can't use this method as a separate rule in OWASP-CRS and allow the user to modified it i.e adding TX:/args_encoded.*/ in the rules, depends on their application for detecting encoded payloads. |
Hey @umarfarook882. I am not sure I really understand you, but here is what I got: You apply your regex to a payload in order to find out if the payload is base64 encoded. If the regex matches, it is very likely it is base64 encoded. You then decode it and write the result into a tx variable ready to be used by subsequent rules. (I appreciate your passion in this. However, please try and describe what you try to do in very simple steps. it is hard to follow you from your writing. This is normal for new people on a project, so no worries. But please help us understand you.) |
Matching base64 payload. Decoding the base64 payload and stored in a variable*
Now i can use this variable in any rules where i need to check for malicious payloads.
Hope you got it now @dune73 |
In a sort of (un)related subject, shouldn't we consider using base64DecodeExt for some of those rules? |
perhaps we should make better use of this as python, which is becoming more and more popular discards invalid chars in a b64 encoded var. Also means that at least the early regex that @umarfarook882 was using can't be used with python (which doesn't seem to have a way to turn this off) or php if the strict argument isn't provided. Perhaps we should add a note in the reference manual |
any progress here?!? |
Not from my side. Have not looked into this for months, and I doubt this will change. If somebody comes up with a clean and well documented implementation, then I am open to review. If we stick to the proposal laid out by me above, then the first RP should only fill the said variable and then add it to one existing rule or so (how about 942100?). I would then certainly review it. |
This is a long discussion that I just found, often base64 stuff will cause false positives and it may be encoded multiple times, including multiple base64 layers encoding, also having a streq match against a base64 string is not very good idea, it is a padded encoding string so adding spaces will make the encoding change and the transformations are not available to prevent the bypasses so this can be a Russian dolls game and spending too many resources is also not desirable so a limit should be set and depending on the PL it could be tuned. Why not using a staggered approach: Something like the following: PL1
PL2
PL3
PL4
Anyway this is cool stuff and there are several plugins in many pentesting tools to do the encoding automatically and the WAF would be blind to them in most cases. Obviously had to be perfected I wrote this in 5 min so expect errors and unnecessary captures and repetitive transforms, I copied the regex for base64 but it needs to be improved it may miss some valid characters as per RFC4648-sections-4&5, also the base64 alphabet should stick to either section 4 or section 5 and not mix them. |
sorry deleted my previous comment on this, I missed the @spartantri comment (it seems a cool approach using t:base64decodeext in order to handle the missing of I'm trying to test all proposed rules in my test env |
This issue has been open 120 days with no activity. Remove the stale label or comment, or this will be closed in 14 days |
During the monthly CRS community chat, we decided to close this in favor of the bigger solution, that does decoding trough a variety of transformations. This is aimed for CRS 3.3. Meeting minutes: #1671 (comment) |
Some vulnerabilities are being exploited using base64-encoded payloads, as suggested in #353.
Consider if we can use transformation operators with multimatch in the existing rule, or if we should make siblings.
There may be performance problems associated with this transformation, so test carefully the performance impact. If this is significant, we are better off adding these as siblings, for example in paranoia level 2.
The text was updated successfully, but these errors were encountered: