Web scraper for aolsoru.com. This website publishes old exam questions publicly. But every question is an image, therefore, you can not search for the specific question. I made the web scraper app that crawls every exam in targetted lecture, downloads images with answers, saves them into SQLite database and converts images to text with OCR. In the end, you can search questions by entering text on the Flask website.
The purpose of the project is for fun.
pip install -r .\requirements.txt
Crawl exams and download questions with answers
python collect_data.py <lecture_name> <url>
Example:
python collect_data.py "felsefe1" "https://aolsoru.com/121-kodu-felsefe-1-dersi-sinav-sorulari"
Convert images to string
python image_to_string.py
Run flask app
python flask_search_question.py
Crawl every exam in the lecture url.
Every exam has 20 questions.
Answers are in the data-value
property.
Sharing questions without permission may cause legal problems.