- Web scraping and data processing.
- Visualizing of basic statistics with assumptions.
Data has been scrabed from Avito.ru and enriached using yandex geocoder
and open street map
For building ETL pipline Airflow
has been used.
for our purposes Airflow is overkill because we have just sequenced pipeline but for future it is convenient way to increase complexity and using parallel processing
Data processing DAG located in here and looks like that:
For running airflow used docker-compose from github.com/puckel with little bit changes for import necessary dependencies from requirements.txt
Jupyter notebook with data analysis
and 2019 data analysis with some machine learning
in here
final data in here