Skip to content

Static lists for bad bots scorring #4

@krizhanovsky

Description

@krizhanovsky

Scoring

Implement scoring as a multiplier for queries. E.g. if a rule defines a score as 1.7, then a detector fetching the top N IPs with the highest RPS, then the RPS values for IPs satisfying the rule, must be multiplied by X. Consider we have IPs and RPSes as:

1.1.1.1 100
2.2.2.2 200
3.3.3.3 1000
4.4.4.4 700

If 1.1.1.1 satisfied a rule and X = 10, then 1.1.1.1 will be in the top 2.

This can be quite compute intensive, so probably scoring values should be stored in a temporary Clickhouse table to let Clickhouse do the query. If we just fetch all the accumulated client statistic into Python and iterate over it, we might overuse CPU and memory. Need some accurate implementation.

Rules for scoring

Implement static lists loaded as files:

  • bad User-Agents
  • bad referers
  • bad IPs
  • ignorance of robots.txt (the path to robots.txt must be configured)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions