GitHub - Handmade-Search-Engine/handmade-indexer

This repo is seperated into two different services.

spider.go is a web crawler which fetches the contents of pages based on a queue, and finds other links from those pages. It explores the website and keeps track of all the pages it visits for the indexer.

This service is written in Go because I like Go, and it's generally more efficient than python.

robots.go is a robots.txt parser.

indexer.py is an indexer which reads the contents of the pages found by the spider, and indexes them based on their keywords.

This service is written in Python because I like python, and NLTK is an amazing library which doesn't have a Go counterpart (as far as I'm aware).

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
.gitignore		.gitignore
README.md		README.md
go.mod		go.mod
go.sum		go.sum
indexer.py		indexer.py
pseudo.txt		pseudo.txt
requirements.txt		requirements.txt
robots.go		robots.go
spider.go		spider.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Uh oh!

Releases

Packages

Languages

Handmade-Search-Engine/handmade-indexer

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages