Nomad

A work-in-progress experimental web crawler to visualise & map the connections between domains.

Design

Most crawlers are created to search the depth of websites to find everything indexable, Nomad is specifically optimised for breadth, only requesting the root page (/) for each domain that it finds.

WARNING: I have made some basic attempts to prevent spamming websites as well as to avoid spam filters, but there is potential that websites could flag your connection and block you (probably ip ban) for using this.

Modes

Web Server

See graphology-frontend's readme
Runs a web server, serving a React frontend & WebSocket based API
Configured, started, and stopped via API
Feeds data back over the WebSocket to display crawl data in real time
Demo video

CLI

See "CLI Usage" below
Configured and started via the command line
Outputs to a file, some of which are HTML pages which can display or replay the data from the crawl

CLI Usage

Currently you need to edit the vars in ./cmd/nomad/main.go to configure a crawl.

Then you can $ go run ./cmd/nomad

Depending on the graph provider you choose there are different ways to view the output:

(The demo images all use https://www.france.fr/ as their initial URL (no particular reason), and show different results as each run of the code can produce different output depending on configuration, response speed of URLs, runtime, etc.)

I implemented this because it was relatively easy, but it doesn't seem like a great way to view the data:

Future Work?

crawler

A new mode, unsure how useful / interesting this would be
Runs until frontier is empty (which could take a very long time)
Feeds data into postgres db for later analysis / visualisation

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.github/workflows		.github/workflows
cmd		cmd
graphology-frontend		graphology-frontend
img		img
internal		internal
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
go.mod		go.mod
go.sum		go.sum
json2dot.py		json2dot.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Nomad

Design

Modes

Web Server

CLI

CLI Usage

Graphology

go-echarts

vis.js

Graphvis

Future Work?

crawler

About

Languages

License

psidex/nomad

Folders and files

Latest commit

History

Repository files navigation

Nomad

Design

Modes

Web Server

CLI

CLI Usage

Graphology

go-echarts

vis.js

Graphvis

Future Work?

crawler

About

Topics

Resources

License

Stars

Watchers

Forks

Languages