A project at CTC11
This project's purpose is building a case for the release of Open Data in Aberdeen. Additionally it explores the possibilities to automate requests to get the data, if we can't get a portal provided by the Aberdeen City Council. See here for the sorry saga.
This part covers the automation of requests to WhatDoTheyKnow a site which makes easier, and allows tracking of, the submission of FOI requests.
We will use the what do they know API to create requests on the fly: https://www.whatdotheyknow.com/help/api
And there as some helpful hints (on which we based this) here
I have adapted these instructions to those below.
But none of them have used selenium to automate the process which was my aim.
- It’s quite simple to create yourself. Just build the URL up in steps:
The base URL: https://www.whatdotheyknow.com/
-
Begin by telling the site that this is a new request: https://www.whatdotheyknow.com/new/
-
Add a forward slash (/) and then the body you want the request to be sent to (exactly as it is written in the url of the body’s page of the website): https://www.whatdotheyknow.com/new/aberdeen_city_council
-
Add a question mark: This tells the website that we are going to introduce a ‘parameter string’. Now our URL looks like this: https://www.whatdotheyknow.com/new/aberdeen_city_council?
-
Input a title: we need to indicate that the next part should go into the ‘title’ field, like this: https://www.whatdotheyknow.com/new/aberdeen_city_council?title= and then indicate what the title should be, thus: https://www.whatdotheyknow.com/new/aberdeen_city_council?title=EV%20Charging%20Points
Notice that if there is a space between words, it should be shown as %20. To make the process of encoding the URLs easier, you can use an encoder tool like this one: https://meyerweb.com/eric/tools/dencoder/ In my code I have used the Python library to encode the URL string ( from urllib.parse import urlencode )
- Input the body of the request, again using ‘%20’ between each word. This is where your URL can become very long! We use the parameter "default_letter"
The salutation (Dear…) and signoff (Yours…) are automatically wrapped around this by the site, so there’s no need to include them:
- Then add the text of the request. In our example, "geo-located list of charging points for electrical vehicles along with a note of the frequency of the update of this data set (e.g. Monthly, Quarterly, Annually)" becomes this when encoded:
a%20geo-located%20list%20of%20charging%20points%20for%20electrical%20vehicles%2C%20along%20with%20a%20note%20of%20the%20frequency%20of%20the%20update%20of%20this%20data%20set%20(e.g.%20Monthly%2C%20Quarterly%2C%20Annually)
-
So, stringing these all together we get: https://www.whatdotheyknow.com/new/aberdeen_city_council?title=EV%20Charging%20Points&default_letter=a%20geo-located%20list%20of%20charging%20points%20for%20electrical%20vehicles%2C%20along%20with%20a%20note%20of%20the%20frequency%20of%20the%20update%20of%20this%20data%20set%20(e.g.%20Monthly%2C%20Quarterly%2C%20Annually)
-
Finally we can use tags. This allows you to add a space-separated list of tags, so for example you can identify any requests made through your campaign or which refer to the same topic.
So, we can add "&tags=ODIAberden%20CTC11" which will tag the request ODIAberdeen and CTC11.
- The final URL we end up with is:
The selenium library allows automation of web functions. You can choose which browser to emulate. In this case I used Chrome.
The code loads the site WhetDoTheyKnow.com and logs in. It uses credentials which I have stored in secrets.py (not published here). The latter contains two lines only: userid = "xxxxxx" userpwd = "yyyyy" which correspond to your account.
I've set up three functions:
- log_in()
- make_reqest()
- Log_off () - not currently implemented
- a function to read in requests from a CSV file
- a loop mechanism to submit each request, calling the make_request() function - and up to a ceiling of, say, 5 per day
- complete the log_off() function
- update the CSV to mark which request have been submitted
- some sort of chron job to run it
Adapt this to handle request for the same subject to multiple authorities (e.g. to all Scottish Local Authoritie)