Disney Movies Wikipedia WebScraper

About the Project

In this Webscraping Project Jupyter notebook, we scrape the Wikipedia pages for Disney movies to create a Disney Movies dataset. We scrape data like Title, Directed by, Produced by, Written by, Narrated by, Music by, Cinematography, Edited by, Production company, Distributed by, Release date, Running time, Country, Language from Wikipedia. We also work with OMDb API to get imdb, metascore, rotten_tomatoes data. The data is stored as JSON and CSV and intermediately using Pickle library in Python.

Tasks

Task 1: Scrape info box from Toy Story 3 Wiki page and save in python dictionary.
Task 2: Scrape info box for all Disney movies and save in list of python dictionaries.
Task 3: Clean the data!
- Strip out all references ([1], [2], etc)
- Split up long strings
- Convert 'Running time' field to integer
- Convert 'Budget' and 'Box office' fields to floats
- Convert dates to datetime objects
- Save data using Pickle
Task 4: Attach IMDb, Rotten Tomatoes, Metascores to dataset using OMDb API.
Task 5: Save final dataset as JSON and CSV files.

Built With

Jupyter Notebook
Beautiful Soup
Requests
Pickle
Pandas

Fork the Repo and Contribute

Contributions are what make the open source community such an amazing place to be learn, inspire, and create. Any contributions you make are greatly appreciated.

Fork the Project (click on Fork in the top-left corner)
Create your Feature Branch (git checkout -b feature)
Commit your Changes (git commit -m 'Add some AmazingFeature')
Push to the Branch (git push origin feature)
Open a Pull Request

Contact

Sinjoy Saha

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
.gitignore		.gitignore
README.md		README.md
Wiki_WebScraper_Notebook.ipynb		Wiki_WebScraper_Notebook.ipynb
disney_data.json		disney_data.json
disney_data_cleaned_1.json		disney_data_cleaned_1.json
disney_data_cleaned_2.pickle		disney_data_cleaned_2.pickle
disney_data_cleaned_3.csv		disney_data_cleaned_3.csv
disney_data_cleaned_3.json		disney_data_cleaned_3.json
disney_data_cleaned_3.pickle		disney_data_cleaned_3.pickle

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Disney Movies Wikipedia WebScraper

Table of Contents

About the Project

Tasks

Built With

Fork the Repo and Contribute

Contact

About

Releases

Packages

Languages

sinjoysaha/Disney-Movies-Wiki-WebScraper

Folders and files

Latest commit

History

Repository files navigation

Disney Movies Wikipedia WebScraper

Table of Contents

About the Project

Tasks

Built With

Fork the Repo and Contribute

Contact

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages