A set of tools for crawling, storing and analyzing data from www.e-seia.cl, a chilean government site.
A python interpreter and a unix like system with wget.
To download all pages you can use:
get_seia_project_list.py <pages>
Where pages is the number of pages that you must to lookup in the search results manually. For large numbers of pages this can take a while.
Downloaded pages will be stored in seia_project_list/ folder.
After download the list you can generate a YAML version with:
ruby project_list_parser.rb > project_list.yaml
-
Add range of time in the get_seia_project_list.py command argmuments.
-
Get the number of pages automatically for a time range.
-
Parse list pages to get the porjects id.
-
Download the project data.
-
Store the project data.
-
Write some queries.
This is free software under GPL v3.