Crawler

This is the answer to HW3 of Modern Information Retrieval, Fall 2022.

scraper.py is the implementation of a scraper that will get the news of Hamshahri website, in the specified interval. The result is stored in dataset.csv.

Indexing.ipynb is a jupyter-notebook that is responsible for storing the data in dataset.csv in an Elasitcsearch index.

Query.ipynb is a jupyter-notebook, in which I implemented a bunch of retrieval methods, inclueding boolean, tf-idf, fasttext, and also storing in elasticsearch.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Crawler

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
Indexing.ipynb		Indexing.ipynb
Query.ipynb		Query.ipynb
README.md		README.md
dataset.csv		dataset.csv
scraper.py		scraper.py

MuhammadKhosravi/Crawler

Folders and files

Latest commit

History

Repository files navigation

Crawler

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages