This is a web scrapping and data analytics project which provides an analysis of passenger reviews of the KQ airlines. I have written an article about the same. Here is the snippet of the dashboard created in Power BI
extract_data.py
: This is a Python script for obtaining the required data.kenya airways analytics.pbix
: A Microsoft PowerBI file containing the report and data models used to make the dashboard visualization.kenya-airways-analytics.ipynb
: A Jupyter Notebook used for general data inspection.requirements.txt
: A file with a list of the packages/libraries used for this project.
- Python version 3.10.4
- A virtual environment (optional but advisable)
- Relevant packages
Here is the workflow of the project
For project reproducibility, you can run the following commands:
- Clone this repository to your preferred folder
git clone https://github.com/Demiga-g/kenya-airways-analytics.git
- (Optional) Set up and activate a virtual environment
python -m venv kq_analytics
source kq_analytics/bin/activate
- Install the dependencies
pip install -r requirements.py
- Run the web scrapping script. Note that this script has been configured to allow you scrap any airline review of your choice. In this case, we will use
kenya-airways
mkdir -p data
python3 extract_data.py \
--input_airline=kenya-airways \
--input_page_size=200 \
--input_sleep_time=3 \
--output_data=data/kenya-airways.csv
What these commands do is:
- It starts by creating a directory
data
, if it does not exist, where the scrapped data will be stored. - Runs the Python script with the name of the airline provided as indicated in the Skytrax website. Most of the time it will have a hyphen in between.
- Retrieve 200 reviews from a page then waits for three second before going to the next page.
- Stores the data in the
data
folder as a.csv
file. Remember to append the.csv
at the end of the name of the file.
Note: You may have to change the user agent accordingly.
- (Optional) Visualization in Microsoft PowerBI
For this step, you can use your preferred visualization tool. However, should you decide to go with Microsoft PowerBI, note that there are some data cleaning steps you would have to take. These may include, but not limited:
- Splitting the
route
column to get the destination and origin columns - Correctly naming the destination and origin countries (checking for spelling errors and abbreviated cities)
- Renaming the columns
- Replacing error values with
null
- Creating calculated measures where necessary.