Skip to content

Commit

Permalink
fix: apply suggestions from code review
Browse files Browse the repository at this point in the history
Co-authored-by: Max Leske <250711+theseion@users.noreply.github.com>
  • Loading branch information
fzipi and theseion committed Sep 25, 2024
1 parent 32d8472 commit f3e4ed9
Show file tree
Hide file tree
Showing 2 changed files with 17 additions and 14 deletions.
27 changes: 15 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -429,25 +429,28 @@ This feature is still experimental and may change in the future.

### What is the idea behind quantitative tests?

Quantitative tests allow you to run tests using payloads to quantify the amount of false positives you might get when running in production.
We use a well-known corpora of text to generate payloads that are sent to the WAF. The WAF should not block these payloads, as they are not malicious.
Quantitative testing mode provides a means to to quantify the amount of false positives to be expected in production for a given rule.
We use well-known corpora of texts to generate plausible, non-malicious payloads. Whenever such a payload is blocked by the WAF, the detection is considered to be a false positive.

Anyone can create their own corpora of text and use it to test their WAF. The corpora of text is a list of strings that are sent to the WAF to check if it blocks them.
Anyone can create their own corpora of texts and use them to test their WAF. Each corpus essentially consists of a list of strings, which may be sent to the WAF, depending on the configuration of the run.

The result of this test is a percentage of false positives. The lower the percentage, the better the WAF is at not blocking benign payloads.
The result of a test run is a percentage of false positives. The lower the percentage, the better the WAF is at not blocking benign payloads for a given rule. However, since we use generic corpora in our tests, the strings in those corpora will not necessarily be representative of the domain of a specific site. This means that a rule with a low false positive rate can still produce many false positives in specific contexts, e.g., when a website contains programming language code.

### What is a corpus? Why do I need one?

A corpus is a collection of text that is used to generate payloads.
The text can be anything, from news articles to books. The idea is to have a large collection of text that can be used to generate payloads.
A corpus is a collection of texts that is used to generate payloads.
The texts can contain anything, from news articles to books. The idea is to have a large collection of texts that can be used to generate payloads. Well-known corpora usually have a domain or context, e.g., news headlines, or English books of the 18th century.

The default corpus is the [Leipzig Corpora Collection](https://wortschatz.uni-leipzig.de/en/download/), which is a collection of text from the web.
The default corpus is the [Leipzig Corpora Collection](https://wortschatz.uni-leipzig.de/en/download/), which is a collection of texts from the web.

### How to create a corpus?

You can create your own corpus by collecting text from the web or using text from books, articles, etc.
Or even use it with your own website! What you will need to do is to implement the interface `corpus.Corpus`, the `corpus.File`,
and for iterating over the corpus, the `corpus.Iterator` and `corpus.Payload` interfaces.
You can create your own corpus by collecting texts from the web, or from books, articles, etc.
You could even use the contents of your own website as a corpus! What you will need to do is to implement the following interfaces:
- `corpus.Corpus`
- `corpus.File`
- `corpus.Iterator`
- `corpus.Payload`

You can see an example of how to implement the `corpus.Corpus` interface in the `corpus/leipzig` package.

Expand Down Expand Up @@ -540,7 +543,7 @@ Running quantitative tests
12:32PM DBG **> rule 932235 => Matched Data: () last found within ARGS:payload: Consolidated Edison () last issued its earnings results on Thursday, November 3rd.
```

The default language for the corpus is english, but you can change it to german using the `-L` flag.
The default language for the corpus is English, but you can change it to German using the `-L` flag.
```bash
❯ ./go-ftw quantitative -d ../coreruleset -s 10K -L deu
Running quantitative tests
Expand All @@ -555,7 +558,7 @@ False positives per rule:
Rule 932290: 5 false positives
```

Results can be shown in json format also, to be processed by other tools.
Results can be shown in JSON format also, to be processed by other tools.
```bash
❯ ./go-ftw quantitative -d ../coreruleset -s 10K -o json
Expand Down
4 changes: 2 additions & 2 deletions internal/quantitative/local_engine.go
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,7 @@ SecAction \
type LocalEngine interface {
// Create creates a new engine to test payloads
Create(prefix string, paranoia int) LocalEngine
// CrsCall benchmarks the CRS WAF using a POST request with the payload
// CrsCall benchmarks the CRS WAF using a GET request with the payload
CrsCall(payload string) (int, map[int]string)
}

Expand All @@ -88,7 +88,7 @@ func (e *localEngine) CrsCall(payload string) (int, map[int]string) {
if e.waf == nil {
log.Fatal().Msg("local engine not initialized")
}
// we use the payload in the URI to rules in phase:1 can catch it
// we use the payload in the URI so rules in phase 1 can catch it
uri := fmt.Sprintf("/get?payload=%s", url.QueryEscape(payload))

tx := e.waf.NewTransaction()
Expand Down

0 comments on commit f3e4ed9

Please sign in to comment.