Skip to content

Handmade-Search-Engine/handmade-indexer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

41 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

This repo is seperated into two different services.

spider.go is a web crawler which fetches the contents of pages based on a queue, and finds other links from those pages. It explores the website and keeps track of all the pages it visits for the indexer.

This service is written in Go because I like Go, and it's generally more efficient than python.

robots.go is a robots.txt parser.

indexer.py is an indexer which reads the contents of the pages found by the spider, and indexes them based on their keywords.

This service is written in Python because I like python, and NLTK is an amazing library which doesn't have a Go counterpart (as far as I'm aware).

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published