Date: 19/08/2020
In this project, I will gather some data from different sources of the top 100 movies all the time in rotten tomatoes website. Using Pandas, BeautifulSoup, and other libraries from python.
First of all, I have a zipped file contains 100 html files, each one corresponds to a movie. We will use BeautifulSoup to extract some information such as movie title, audience score, and number of audience ratings. The extracted information well then be converted to a data frame.
Second, I was given 100 URLs of Ebert's review (movie critique). Using requests library, I will download the reviews and save them as text files, each text contains the review of a movie. Then I will make a new data frame that contains the movies and the review.
Third, I will use library wptools to search for movies posters and download each movie poster to a folder. Resulting in a data frame that contains the movie name and poster URL.
Finally, I will give an example to save one of the data frames as SQL database and or as a csv file.
-
Notifications
You must be signed in to change notification settings - Fork 0
In this project, I will gather some data from different sources of the top 100 movies all the time in rotten tomatoes website. Using Pandas, BeautifulSoup, and other libraries from python.
MohannadAlnahhas/Udacity_DataAnalyst_GatheringData
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
About
In this project, I will gather some data from different sources of the top 100 movies all the time in rotten tomatoes website. Using Pandas, BeautifulSoup, and other libraries from python.
Topics
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published