In this lesson, we'll review all the guidelines and specifications for the final project for Module 2.
- Understand all required aspects of the Final Project for Module 2
- Understand all required deliverables
- Understand what constitutes a successful project
Another module down--you're half way there!
For the culmination of Module 2, you just need to complete the final project!
For this project, you'll be working with the Northwind database--a free, open-source dataset created by Microsoft containing data from a fictional company. You probably remember the Northwind database from our section on Advanced SQL. Here's the schema for the Northwind database:
The goal of this project is to test your ability to gather information from a real-world database and use your knowledge of statistical analysis and hypothesis testing to generate analytical insights that can be of value to the company.
The goal of your project is to query the database to get the data needed to perform a statistical analysis. In this statistical analysis, you'll need to perform a hypothesis test (or perhaps several) to answer the following question:
Do discounts have a statistically significant effect on the number of products customers order? If so, at what level(s) of discount?
In addition to answering this question with a hypothesis test, you will also need to come up with at least 3 other hypotheses to test on your own. These can by anything that you think could be imporant information for the company.
For this hypothesis, be sure to specify both the null hypothesis and the alternative hypothesis for your question. You should also specify if this is one-tail or a two-tail test.
To complete this project, you will need to turn in the following 4 deliverables:
- A Jupyter Notebook containing any code you've written for this project. This work will need to be pushed to your GitHub repository in order to submit your project.
- A Blog Post explaining your process, methodology, and findings.
- An "Executive Summary" Keynote/PowerPoint/Google Slide presentationn (delivered as a PDF export) that explains the hypothesis tests you ran, your findings, and their relevance to company stakeholders. Make sure to also add and commit this pdf of your non-technical presentation to your repository with a file name of presentation.pdf
- A Video Walkthrough of your “Executive Summary” presentation. Some common video recording tools used are Zoom, Quicktime, and Nimbus. After you record your presentation, publish it on a service like YouTube or Google Drive, you will need a link to the video to submit your project.
For this project, your jupyter notebook should meet the following specifications:
Organization/Code Cleanliness
- The notebook should be well organized, easy to follow, and code is commented where appropriate.
* Level Up: The notebook contains well-formatted, professional looking markdown cells explaining any substantial code. All functions have docstrings that act as professional-quality documentation.
* The notebook is written to technical audiences with a way to both understand your approach and reproduce your results. The target audience for this deliverable is other data scientists looking to validate your findings.
* Any SQL code written to source data should also be included.
Findings
- Your notebook should clearly show how you arrived at your results for each hypothesis test, including how you calculated your p-values.
* You should also include any other statistics that you find relevant to your analysis, such as effect size.
Your blog post should include everything from how you identified what tables contained the information you need, to how you retrieved it using SQL (and any challenges you ran into while doing so), as well as your methodology and results for your hypothesis tests.
NOTE: This blog post is your way of showcasing the work you've done on this project--chances are it will soon be read by a recruiter or hiring manager! Take the time to make sure that you craft your story well, and clearly explain your process and findings in a way that clearly shows both your technical expertise and your ability to communicate your results!
Your presentation should:
- Contain between 5-10 professional quality slides detailing:
* A high-level overview of your methodology
* The results of your hypothesis tests
* Any real-world recommendations you would like to make based on your findings (ask yourself--why should the executive team care about what you found? How can your findings help the company?)
* Take no more than 5 minutes to present
* Avoid technical jargon and explain results in a clear, actionable way for non-technical audiences.
In order to submit your project for review, include the following links to your work in the corresponding fields on the right-hand side of Learn.
- GitHub Repo: Now that you’ve completed your project in Jupyter Notebooks, push your work to GitHub and paste that link to the right. (If you need help doing so, review the resources here.) Reminder: Make sure to also add and commit a pdf of your non-technical presentation to the repository with a file name of presentation.pdf.
- Blog Post: Include a link to your blog post.
- Record Walkthrough: Include a link to your video walkthrough.
Hit "I'm done" to wrap it up. You will receive an email in order to schedule your review with your instructor.