Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What pages can explain the log format ? #301

Open
pmquang opened this issue Apr 18, 2018 · 6 comments
Open

What pages can explain the log format ? #301

pmquang opened this issue Apr 18, 2018 · 6 comments

Comments

@pmquang
Copy link

pmquang commented Apr 18, 2018

Hi,

I'm using lua-resty-waf and have some logs like:

{"match":14,"msg":"SQL probing attempt","id":41036}

but I don't know 'match' value which mean ? Could someone can help me explain ? Do we have a wiki page about log format ?

Regards

@ScottieIOT
Copy link

Hello. I am just another guy using this wonderful waf project, but I will do my best to give my two cents. The log format is not easily configurable and is something that I modified for my needs, but ultimately you can enable additional portions of the request that can be displayed as part of the logs, take a look at these options :

event_log_request_arguments
event_log_request_body
event_log_request_headers

if you enable these and compare with the resulting logs with the 41000 ruleset json file pattern (using a regex match, maybe using regex101 or similar site), you should be able to determine what in the request is causing the match. for 41036 the pattern from the rule file (in rules folder) is as follows :

"pattern" : "(?:(?:[\"'´’‘]\\s*?\\*.+(?:x?or|div|like|between|and|id)\\W*?[\\\"'´’‘]\d)|(?:\^[\"'´’‘])|(?:^[\\w\\s\\\"'´’‘-]+(?<=and\s)(?<=or|xor|div|like|between|and\s)(?<=xor\s)(?<=nand\s)(?<=not\s)(?<=\|\|)(?<=\&\&)\w+\()|(?:[\"'´’‘][\\s\\d]*?[^\\w\\s]+\\W*?\\d\\W*?.*?[\\\"'´’‘\d])|(?:[\"'´’‘]\\s*?[^\\w\\s?]+\\s*?[^\\w\\s]+\\s*?[\\\"'´’‘])|(?:[\"'´’‘]\\s*?[^\\w\\s]+\\s*?[\\W\\d].*?(?:#|--))|(?:[\\\"'´’‘].?\\s*?\d)|(?:[\"'´’‘]\\s*?(x?or|div|like|between|and)\\s[^\\d]+[\\w-]+.*?\\d)|(?:[()\\*<>%+-][\\w-]+[^\\w\\s]+[\\\"'´’‘][^,]))"

@pmquang
Copy link
Author

pmquang commented May 16, 2018

Hi @ScottieIOT ,

Thanks for your response. But the point is I don't understand the 'match' value. Why 14 in my example.

@p0pr0ck5
Copy link
Owner

Hi,

Apologies for the delayed response.

The match field reflects the value returned by the rule operator. Can I ask you to enable debug logs for your case so we can see more detail? Just to make sure we're all on the same page :)

@pmquang
Copy link
Author

pmquang commented May 22, 2018

hi @p0pr0ck5 ,

What the value returned by the rule operator means ? It's running on production, so I can not change the configuration arbitrarily.

@ScottieIOT
Copy link

Hello pmquang, Sorry I was unable to properly answer your question. I am also curious to know how the value of "match" is determined.

In order to determine this I started creating violations and comparing the logs generated to the rule json data, but its not clear exactly what "match" is or why its present in the logs at all.

Here is an example request :
curl -vvv "http://www.examplesite.com/bla?union=select&regexp=../../../etc/passwd"

This request created the following :
[{"match":1,"id":40002},{"match":9,"id":40016},{"match":3,"id":40022},{"match":1,"id":41004},{"match":16,"logdata":16,"id":99001}]

(please excuse the rule descriptions being missing from the ID's as I have modified the project to exclude that information for the purposes of conserving log bandwidth and log cluster storage space.)

Here is the first rule which has a match value in the logs of "1" :

actions" : {
"disrupt" : "SCORE",
"nondisrupt" : [
{
"action" : "setvar",
"data" : {
"col" : "TX",
"inc" : 1,
"key" : "anomaly_score",
"value" : 4
}
}
]
},
"id" : 40002,
"msg" : "Repetitive non-word characters anomaly detected",
"operator" : "REFIND",
"opts" : {
"transform" : "uri_decode"
},
"pattern" : "[^\\w\\r\\n]{4,}",e
"vars" : [
{
"parse" : [
"all",
1
],
"type" : "REQUEST_ARGS"
}
]
},

You can see that there is a "inc" value of 1 listed. I am not yet certain what "inc" means in the rules.
There is also a value for anomaly score of "4" so we can be certain that the value in the logs for rule 40002 , "1", is NOT representative of score current score itself, but potentially may be the placement of the character that was violated within the regex statement. Here is the pattern after removing the extra escaping :
[^\w\r\n]{4,}

Which means, any character found more then 4 times which is not a word character, carriage return, or new line will trigger the rule.

In this case, since there isn't any OR statement in the pattern, my guess is that the only potential "match" value for rule 40002 is "1" .

However, making the determination for a more complicated regex, that includes multiple Elvis Operator "?:" usage like rule 41036 , might be more difficult. It would be nice if the function returned the characters that matched instead of a numeric value, but as long as it is understood whats happening, you could easily make a WAF log result translator to tell you what was matched based on the number.

I will check back on this thread for an explanation as to what the "match" field exactly is.

@p0pr0ck5 , Thanks for much for your time and effort.

@p0pr0ck5
Copy link
Owner

Hi @pmquang @ScottieIOT,

The match value in the logs indicates the value returned by the rule operator. In the cases noted above, the operator is REFIND, which is an alias to ngx.re.find. This function returns the index of the string match, so we don't need to create a new string object.

This is admittedly not a particularly useful value in this case, but in reality the operator return value isn't even a concept in the SecRules DSL- either a rule matches or it doesn't, and specific data about what part of the variable matched the rule pattern is typically found in the TX collection. IMO, this is a bit unwieldy, and it's one of my least favorite aspects about ModSecurity's SecRules. And of course, when the original rules in question were written here we didn't support parsing out things like %{TX.0} in rule messages, so this value isn't available.

In reality, this aspect of this project needs some significant effort to rewrite and improve, but I don't quite have the time for that at this point. I'd be happy to hear suggestions/requests about useful data formats that folks are using in the interim.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants