Skip to content

Commit

Permalink
Completed
Browse files Browse the repository at this point in the history
  • Loading branch information
DeleLinus committed Nov 25, 2021
1 parent 01ba703 commit 6f94f29
Show file tree
Hide file tree
Showing 19 changed files with 35,361 additions and 0 deletions.
10 changes: 10 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
# ignore rubrics
Project_Rubric_Requirements.pdf

# ignore all word documents
*.docx

# ignore
TASK_Description.pdf
tweet_json.txt
Data
2,076 changes: 2,076 additions & 0 deletions Data/image-predictions-3.tsv

Large diffs are not rendered by default.

2,330 changes: 2,330 additions & 0 deletions Data/tweet_json.txt

Large diffs are not rendered by default.

Binary file added Graph_Images/Pearson_correlation.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added Graph_Images/best_golden_retriever.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added Graph_Images/graph_dog_breeds.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added Graph_Images/graph_dog_stages.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added Graph_Images/graph_sources_of_tweets.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2,076 changes: 2,076 additions & 0 deletions Graph_Images/image_predictions.tsv

Large diffs are not rendered by default.

Binary file added Graph_Images/plot_retweet_favorite_counts.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
52 changes: 52 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@

# WeRateDogs Wrangle and Analyze Data

The dataset I wrangled (and analysed and visualized) is the tweet archive of Twitter user @dog_rates, also known as WeRateDogs. WeRateDogs is a Twitter account that rates people's dogs with a humorous comment about the dog. These ratings almost always have a denominator of 10. The numerators, though? Almost always greater than 10. 11/10, 12/10, 13/10, etc. Why? Because "they're good dogs Brent." WeRateDogs has over 4 million followers and has received international media coverage.

WeRateDogs downloaded their Twitter archive and sent it to Udacity via email exclusively for you to use in this project. This archive contains basic tweet data (tweet ID, timestamp, text, etc.) for all 5000+ of their tweets as they stood on August 1, 2017.

## Acknowledgements

- [Udacity](https://udacity.com)
- [@dog_rates](https://twitter.com/dog_rates)



## Tech Stack

**Python:** Pandas, Numpy, Seaborn, Matplotlib, requests,tweepy
## Documentation

**Project Details:**\
My tasks in this project are as follows:\
* Data wrangling, which consists of:
* Gathering data (downloadable file in the Resources tab in the left most panel of your classroom and linked in step 1 below).
* Assessing data
* Cleaning data
* Storing, analyzing, and visualizing your wrangled data
* Reporting on 1) my data wrangling efforts and 2) my data analyses and visualizations
**Gathering Data for this Project:**\
I gathered each of the three pieces of data as described below in a Jupyter Notebook titled wrangle_act.ipynb:
* The WeRateDogs Twitter archive. This is a file on hand. I downloaded this file manually by clicking the following link: twitter_archive_enhanced.csv
* The tweet image predictions, i.e., what breed of dog (or other object, animal, etc.) is present in each tweet according to a neural network. This file (image_predictions.tsv) is hosted on Udacity's servers and was downloaded programmatically using the Requests library and the following URL: https://d17h27t6h515a5.cloudfront.net/topher/2017/August/599fd2ad_image-predictions/image-predictions.tsv
* Each tweet's retweet count and favorite ("like") count at minimum, and some additional data I found interesting. Using the tweet IDs in the WeRateDogs Twitter archive, I queried the Twitter API for each tweet's JSON data using Python's Tweepy library and store each tweet's entire set of JSON data in a file called tweet_json.txt file.


## Methodology
After gathering each of the above pieces of data, I assessed them visually and programmatically for quality and tidiness issues. I detected and documented at least eight (8) quality issues and two (2) tidiness issues in the wrangle_act.ipynb Jupyter Notebook.\
**Cleaning Data for this Project:**\
I cleaned each of the issues I documented while assessing. Performed this cleaning in wrangle_act.ipynb as well. The result was a high quality and tidy master pandas DataFrame named twitter_archive_master.csv.\
**Storing, Analyzing, and Visualizing Data for this Project:**\
I analyzes and visualized the wrangled data in the wrangle_act.ipynb Jupyter Notebook. At least five (5) insights and six (6) visualizations was produced.
## Analysis Report
I created a written report called `wrangle_report.pdf` that briefly described my wrangling efforts.\
I also created a written report called `act_report.pdf` that communicated the insights and displayed the visualizations produced from the wrangled data.
## Screenshots
![Screenshot (90)](https://user-images.githubusercontent.com/58152694/143286139-3426687d-0983-40f5-ba0e-59364cd909bb.png)



## Feedback

If you have any feedback, please reach out to me at ayangidel@hotmail.com

Binary file added act_report.docx
Binary file not shown.
Binary file added act_report.pdf
Binary file not shown.
2,518 changes: 2,518 additions & 0 deletions twitter-archive-enhanced-2.csv

Large diffs are not rendered by default.

2,066 changes: 2,066 additions & 0 deletions twitter_archive_master.csv

Large diffs are not rendered by default.

Loading

0 comments on commit 6f94f29

Please sign in to comment.