Table of Contents
This is a solution to the assignment "Spreadsheet Analysis" as outlined in the course "Introduction to Python & Apps" by Code First Girls Ltd. It entails basic analysis of fictional sales data using both Excel and Python. See Features for an overview of the methods and libraries/modules used.
Python | PyCharm | pandas | Matplotlib | Seaborn | Microsoft Excel 365 |
---|---|---|---|---|---|
- Use different libraries/modules for simple analysis of data from an Excel workbook:
- pandas
- openpyxl
- seaborn
- matplotlib
- Find the total sum of values and the minimum and maximum values
- Use pct_change for working out periodic changes as a percentage
- Create a simple lineplot for showing periodic changes as a percentage
- Import an Excel file and then write results to a new Excel file
- Assign data to different spreadsheets upon writing to the new Excel file
-
Install Python and/or Install PyCharm by Jet Brains (you can opt for the free and open-source community edition). Make sure you are downloading a Python 3 version.
-
Get a subscription for Microsoft 365 and download the latest verstion of MS Excel.
-
See Pycharm Guide for creating a new Py project.
-
Download the file sales.csv via the repository's main page.
-
PIP package
-
See if PIP is already installed on your system. Write in either Windows OS shell or macOS terminal:
pip help
pip3 help
python3 -m pip help
-
If it is not installed, write in shell or terminal:
python3 get-pip.py
N.B. Make sure you have added pip.exe to the Environment Variables (Path).
-
pandas library; openpyxl module; seaborn library; matplotlib
There are 2 ways, EITHER:
-
Write in PyCharm IDE install [library name]. E.g.,
install pandas
-
OR Write in shell or terminal pip install [library name]. For macOS terminal, if that instruction does not work, you can write pip3 install [library name]. E.g.,
pip install pandas
-
If you want to have a go yourself, I have provided the guide below. You will create 3 separate Py files that reflect each stage of the process.
- N.B. From this point on, Python will be abbreviated to Py.
PY FILE 1
- Create a new .py file in your project folder in PyCharm (if you have not already done so). Name the file.
- Import openpyxl and math.
- N.B. It is good practice to import your libraries/modules at the top; however, I have imported pandas as pd later in the code so that it is shown within its immediate context.
- Import sales.csv file (sales.csv); otherwise, create your own spreadsheet with similar quantitative data.
- Create a function named read_data using def to retrieve the sales and months from the spreadsheet.
- Use the def method to create a function that makes a Py dictionary from the data for monthly sales (from sales.csv).
- Collect all the sales from each month into a single list using the list() function.
- Get the sales total (across all months) using the sum() function.
- Find the largest and smallest values in the sales data using the min() and max() functions.
- Find the months with the lowest/highest number of sales by using the dictionary get() method and writing the keys as max_value and min_value.
- Find the average value of sales using the lens() function (returns the number of values for sales). Then continue the formula by writing total / number_of_sales (total divided by the number of values returned by the lens function).
- Using math.ceil() method (from the math module), round the number of average sales that was calculated in the previous step. You do not have to do this, but it ensures better readability of the table that we will be creating later.
- Import pandas as pd to read sales.csv and create a dataframe. Use pct_change() method to retrieve sales data and calculate the monthly change in sales from one month to the next as a percentage. Multiply pct_change by 100 (* 100) to complete the formula. NaN will appear in the table for Jan if you do not use the fillna() function; this is because there is no value for the previous month (in the sales data) with which to calculate the change in sales for Jan. Fill none with the value 0 to make it clear that there is no monthly-change data for this month.
- Create a pandas dataframe for the output. Then use ExcelWriter and df.to_excel to write the results (df) to a new workbook and spreadsheet. Write the parameter index=False to remove the surplus column.
- This step is optional: you can load the existing workbook and change its formatting so that the size of the cells fit the values better. I have used from openpyxl import load_workbook to load the workbook and from openpyxl.utils import get_column_letter for this task.
- Save and close the workbook.
- Print the results with the string format() method so that they appear in the console (you will need these values for writing to the new Excel sheet; see below). /n in the print function adds a line break. Then write run() to complete def run().
PY FILE 2
- Import pandas as pd.
- Read the Excel file (created via the previous steps) into a pandas dataframe object with the pd.read_excel function.
- With your results from PY FILE 1, add the new data as a dictionary of lists and create a dataframe for it with the pd.DataFrame function.
- Write the dataframe to a new Excel spreadsheet using pd.Excelwriter. Name your sheet in the parameters and write index=false to remove the generated extra column.
PY FILE 3
- Import: seaborn as sns; pandas as pd; matplotlib as plt.
- Specify columns from which to retrieve the data. Read this data into a pandas dataframe via the pd.read_excel function.
- Print this data.
- Create a simple lineplot with the sns.lineplot() function.
- Show this lineplot by writing plt.show(). You can then save this lineplot to a local folder.
See the open issues for a full list of proposed features and known issues.
If you have a suggestion that would make this better, please fork the repo and create a pull request. You can also simply open an issue with the tag "enhancement". Don't forget to give the project a star! Thanks!
- Fork the Project
- Create your Feature Branch (
git checkout -b feature/AmazingFeature
) - Commit your Changes (
git commit -m 'Add some AmazingFeature'
) - Push to the Branch (
git push origin feature/AmazingFeature
) - Open a Pull Request
Distributed under the MIT Licence. See "LICENSE.txt" for more information.
My profile - isabelsw
Project link - https://github.com/isabelsw/cfgdataanalysis
- The project brief was created by Code First Girls Ltd. A special thanks goes to my instructors Vanny and Andrew on the "Introduction to Python & Apps" course.
- README template was created by Othneil Drew.
- GIF courtesy of GIPHY.
- PyCharm logo: Copyright © 2000-2024 JetBrains s.r.o. JetBrains and the JetBrains logo are registered trademarks of JetBrains s.r.o.
- Python logo: Copyright © 2001-2024 Python Software Foundation.
- pandas logo: Copyright © 2008 AQR Capital Management, LLC, Lambda Foundry, Inc. and PyData Development Team.
- Seaport logo: Copyright © Matthias Bussonnier and Seaport.
- matplotlib logo: Copyright © 2002–2012 John Hunter, Darren Dale, Eric Firing, Michael Droettboom and the Matplotlib development team; 2012–2024 The Matplotlib development team.