Skip to content
This repository has been archived by the owner on Apr 16, 2024. It is now read-only.

Just a simple crawler to get a full index of destatis (Federal Statistical Office of Germany) statistics to use for further processing

Notifications You must be signed in to change notification settings

qwertzlbert/destatis_crawler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

destatis_crawler

Just a simple crawler to get a full index of destatis (Federal Statistical Office of Germany) statistics to use for further processing

This is just a basic crawler to get a full list of all statistics of the Destatis Genesis database. You can use the output for further processing.

##Requirements To get this work you just need to install the scrapy framework via pip (pip install scrapy), your favorite package manager or directly via the Scrapy GitHub repository

##How to To run the crawler you just need to enter the command:

scrapy crawl destatis

To get a csv output enter the command:

scrapy crawl destatis -o output.csv

For json output:

scrapy crawl destatis -o output.json

##Handler With the scrapy_handler class inside scrapy_handler.py you can call the scrapy spider and receive a dict of the links (either the full list of all stats or the direct download link) which you can use for further processing, without the need to read the output from another file.

call get_list() to receive a list of all stats and download_csv(url) with the URL of a statistic overview page as argument to receive the direct download link.

About

Just a simple crawler to get a full index of destatis (Federal Statistical Office of Germany) statistics to use for further processing

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages