Web-Search-Engine

Web Search Engine developed in Java, while web crawler is developed in Python 3.
A simple search engine which is based on the frequency of the key words in the text files.

Project Components:
--> Imported Packages : Text Processing, Sorting
--> Python Web crawler: web-crawler.py
--> Text Files: websites.txt, stop-words.txt
--> Folders: hashmap_data, urls
--> Java File: URLtoText.java - Code to parse URLs to text files.
--> Jave File: SearchEngine.java - Driver Code along with functions.

Concepts Used:

Sorting (Merge Sort)
Ternary Search Trie
Hash Maps
Text Processing (JSoup, String Functions)
Memory Management (Caching)

Flow of Execution of the Search Engine:

Use of Python web crawler to crawl the web and recursively retreive around 1500 URLs.
Each URL is parsed to a text file using JSoup.
Stop words are removed from the Search String given by the user.
String is converted to token using Java String Tokenizer.
All URLs are indexed into a Hash Map.
TST is generated for each text file and frequency of keywords are extracted.
To implement page ranking, frequency of these words along with the URL index are stored in the Hash Map.
The page ranking Hash Map is sorted in decreasing order of frequency words.
Page ranking Hash Map is stored in memory to implement cache and drastically improve search time.

Screenshots:

--> Driver Java file

--> Cache file generated

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
Screenshots		Screenshots
Sorting		Sorting
Text Processing		Text Processing
src/accwebsearchengine		src/accwebsearchengine
.classpath		.classpath
.gitignore		.gitignore
.project		.project
README.md		README.md
jsoup-1.12.1.jar		jsoup-1.12.1.jar

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Web-Search-Engine

About

Uh oh!

Releases

Packages

Contributors 4

Uh oh!

Languages

gagannagpal131/ACC-Web-Search-Engine

Folders and files

Latest commit

History

Repository files navigation

Web-Search-Engine

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Uh oh!

Languages

Packages