This is a POC of a web fuzzer. Maybe it leans a bit more towards a tool to get good coverage on a website, but it can easily be made into a good fuzzing tool. This was supposed to be a blog post sharing some ideas on fuzzing the web, but I felt it needed some source code with it. I thought about this a lot a few years back. Note : this is a POC, some functionality is limited so script kiddies don't go around and attack websites, if you are familiar with web hacking you should be able to put the pieces together. So, no this is not a shinny product. This is me unleashing some ideas and thoughts on fuzzing the web, pay me and maybe I will make a product :=).
I'm available on on Twitter to discuss ideas.
- coverage tool
- key part for doing web fuzzing. Every easily exposed url/access point will get fuzzed hard on the web, you want to try to find more hidden part of a website.
- fuzzing logic
- generating the malformed data. Usually with some test case generated by a genetic algorithm. This is harder to do for the web, since the web is more of a black box than a binary file. However payloads for the web is usually less complex than those for a binary. Usally you can quickly see if a entry point is vulnerable by sending simple payloads with something like a quotation mark and view the response. Some bugs are more hidden.
- instrumental tool
- helps you reproduce a bug or analyze a bug. The nice thing about the web is that you have to do a lot less setup. You don't have to align some memory to be able to trigger a bug. No need for some heap feng shui, unless you attack the technology under a website.
Yes.
The big difference between fuzzing on the web versus fuzzing a local binary is that when you fuzz a binary you usually have all the information you need. You can view the entire memory state of the system. It's not a black box. Sure you can be somewhat limited in your abilities to get information from the binary if the binary is obfuscated, but my point is that on the web you usually have to guess how something works on the backend. Sometimes you are lucky and able to get the backend to leak information on what went wrong, but usually your best hope is that the entire site is javascript so you can audit it easily. You also have the challenge that you never know if you are done looking at a website. If you are fuzzing a binary you can look at code coverage, see what addresses haven't been hit yet and then try to make new test cases to get those lines hit. On the web some urls might never get visited because they are so well-hidden so even search engine can't find them. There is also a problem with knowing when a website is updated, you can easily diff binaries. The overhead with checking if one page on a website is updated is much larger.
So as said in the TLDR, this tool is more aimed at getting good coverage on a website to make fuzzing easier and auditing easier. The classic approach to writing a tool to find bugs on the web is to write a basic scraper. Just get all the links off a website, clean the list up and boom you have a corpus to explore from. You will definitely get some coverage out of that, but many pages today are dynamic so you can lose out on a lot of coverage. That's why many instead uses a proxy and connect the browser to that proxy and let the browser reports every url it visits. Now you get a lot more coverage. My approach is a combination of ideas.
I have a scraper that will run in the background and I have written a chrome extension that will help report dynamic urls to the scraper backend. The idea is that there is a core backend with a scraper that will receive all the urls from chrome and maybe even phone apps, give the backend support for mitmproxy and you can get mobile coverage as well. The scraper will then try to find new urls based on the data it has received that it has not scanned before.
The hard thing about fuzzing the web is that you never know what step to take next. In a binary you can look for functions like strcpy and try to fuzz that for memory corruption bugs. You can't that on the web, you have to guess. You can defiantly make educated guesses, but you never know before you have actually visited that page if it was interesting. So I have two functions, one function to score an url before it is scanned and after it is scanned. This should both help with selection of the new url to scan by the scraper and rate the importance of the url when you look at the results.
The big difference with my approach and all other approaches (as far as I have seen). Is that Chrome will try to explore the website on its own. Currently it clicks buttons, but you can easily expand this function. Chrome will try to explore different states of the website, if a new state opens it will explore that state as long as the new state is on the same website. By doing this the chrome extension will be able to uncover parts of the website that most people might have ignored or not seen. Multiple times have I found urls this way that could not have been found any other way.
You also want the scraper to be able to login to the website. By doing a login in chrome, the extension will send over the data(a login profile basically) to the backend and save the login structure to be able to replay it. This by itself will defeat most classic scrapers.
Burp suite might be nice on the feature side, but I would not say that about the design. My idea is that you usually want a clean interface that easily gives you more info as you need it. So I have taken a search engine like approach. You send in a query for what you are looking for, response is sent back based on the query rated by the score function. Then you can easily click on a url and if you want to make this tool better you should make it possible to click a button to view more info about that url. Sometimes you want to query something that you scanned way back, this approach helps you with just that.
- read code comments, I have commented some flaws and things to fix if you want to make this into a full fuzzer(like the metrics functions).
- send urls from the bakcend to chrome and make chrome scan it automatically with the explorer.
- store more data, the poc was ment to be simple so I did not do this. However, you should rellay save all the data off a website. This makes it easy to look up things you migth have missed in the future without having to revist a website. Maybe a new bug in the web aritecture is found, something like heartbleed. Would be nice to just be able to query the database for something that would indiate a bug and instalty get a list of websites to check.
- make that data more accosible in the interface.
- the original idea was to be cleaver with the use of Merkel trees. Store the tree and easily look up if a url is scanned, I think there are better ways to do this.
- encrypt the login sent from chrome.
- remove unused chrome permissions, I added a bunch of permission to not have to deal with chrome complaining.