Tasks for a technical interview
Extract and save data from Tadawul Exchange website's monthly reports with a specific focus on 'Trading by Nationality.
- Using
wget
wget --user-agent="Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:132.0) Gecko/20100101 Firefox/132.0" "https://www.saudiexchange.sa/wps/wcm/connect/c4f67241-1068-48ba-b5c6-6276a4ca77ae/Monthly+Trading+and+Ownership+By+Nationality+Report+31-10-2023.pdf?MOD=AJPERES&CACHEID=ROOTWORKSPACE-c4f67241-1068-48ba-b5c6-6276a4ca77ae-oP52nbM"
Here is an example from getting the 2023_10__REPORT.pdf
report when filtering by Nationality
.
Note that for this website we need to add the user-agent
to the request.
Now we just need to add this command into a bash script and call it from a python script with the desired monthly date, but first to get the date we need to parse the page content to get all the available dates.
reports_urls_csvs/- The idea is to get only the lines with Monthly Trading and Ownership By Nationality Report
- And here is the page after outputting the page to a tmp file
python3 main.py > temp
Or we just can take the herf
directly from here.
After parsing the temp
file with news_reports_urls_etl.py
script , displayed is two directories one for weekly
and another for monthly
both filtered by naionality
.
Each file contains the date with the url for all available pdfs within that category.
Those csv files will be later used with the input from the user, which year ? which type ? ....
The pdf_handler.py
with the help of parse_pdf_page
function in utils.py
file do the following:
- Take desired report, its date, year and type
- Get its url from previously saved csvs file and download it if we didn't download it before
- Parse the table content of page 5 and 6 of the pdf report and convert it to csv file
A look into the row lines output from the pdfplumber
library of page 5
A look into the row lines output from the pdfplumber
library of page 6
ls -R final_reports_csvs/
final_reports_csvs/:
31-07-2023-Monthly-page-5.csv 31-07-2023-Monthly-page-6.csv
Here is the final output csv file of the report of page 5
Here is the final output csv file of the report of page 6
- run
web_scraping.py
to make thetemp
file of all pdfs urls
python3 web_scraping.py > temp
- then run
news_reports_urls_etl.py
script to make thereports_urls_csvs
directory of all years and types
python3 reports_urls_csvs.py
- then run
pdf_handler.py
to get input from user, which year and which type, download the pdf file and parse its content and convert it to csv file and save it
python3 pdf_handler.py
t1.webm
Build a web-based dashboard to display and analyze data from an Excel file containing financial metrics for various tickers.
-
I started by inspecting the two dataframes with jupyter notebooks, you can see it in the
notebook.ipynb
along with useful insights about the data and wrangling. -
Made a line graph showing
FO%
changes byADNOCGAS UH
in Ticker,Energy
in Sector andUAE
in Country
-
Then I started developing a backend api with
flask
,pandas
in python to filter and retrieve the data for the line graph plot in the frontend, all data related work is inspired by inspecting in thenotebook.ipynb
-
Here is an example of and endpoint
http://localhost:8000/line_graph_selections
which returns all unique values ofCountries, Exchanges, Sectors, Tickers
columns so that we can chose from in the frontend.
- Then started to develop the frontend with
react
, the page features a pref summary about theCountries, Exchanges, Sectors, Tickers
data, select drop-down menu to chose desired filtration, and then the line plot.
- Also we can filter by date, and the resulted filtered data is shown in a nice formatted table for reference.
- run
main.py
in theForeign-Ownership-Analysis
directory for the backend, it will run on port 8000.
python3 main.py
- then we start the frontend dev environment by running the following command in the
web-dashboard
directory <ater installing react, npm, and its modules> and it will start on port 5000
npm run start