Skip to content
This repository has been archived by the owner on Sep 9, 2022. It is now read-only.

rodolfoghi/olx-phone-loader

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

OLX Phone Loader

A web scraper to get data from OLX ads.

How use

  1. Clone or download this repository.
  2. Install the dependencies with pip install -r requirements.txt.
  3. Install Tesseract, instructions here.
  4. Set the value of variable ocr.pytesseract.tesseract_cmd on converter.py.
  5. Run python app.py on your prefered terminal.

How it works

  1. Made request to url using urllib.request to get the list of ads.
  2. Parsed html reponse using BeautifulSoup.
  3. Made a new request for each ad.
  4. Search for phone in response. The phone is a GIF file. :(
  5. Save the gif file on images folder.
  6. Converts the gif to png and save it.
  7. Reads phone text from image using pytesseract.
  8. Lastly, save the data on csv file using the csv Python lib.

About

A web scraper to get data from OLX ads.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published