This repo highlights the following three data skills:
- Web scraping "Mars" websites
- Storing data with Mongo DB
- Building a web application through Flask
Four different website were scraped using the open-source tool Splinter to automate browser actions...
from splinter import Browser
from webdriver_manager.chrome import ChromeDriverManager
executable_path = {'executable_path': ChromeDriverManager().install()}
browser = Browser('chrome', **executable_path, headless=False)
and the Python package Beautiful Soup to parse through HTML...
from bs4 import BeautifulSoup as bs
url = "https://mars.nasa.gov/news/"
browser.visit(url)
html = browser.html
news_site = bs(html, 'html.parser')
for scraping relevant data, in this case the latest Mars news headline and first paragraph.
result = news_site.find('div', class_ = 'list_text')
news_title = result.find('a').text
news_para = result.find('div', class_ = 'article_teaser_body').text
This web scraping is assigned as a function [scrape()] in the scrape_mars.py file and is called in the app.py file as a flask route [("/scrape")].
In the app.py file, I connect to a local Mongo database...
from flask_pymongo import PyMongo
mongo = PyMongo(app, uri="mongodb://localhost:27017/mars_app")
then store and update the database with the scraped Mars data - as a route.
@app.route("/scrape")
def scrape():
mars_data = scrape_mars.scrape()
mongo.db.collection.update({}, mars_data, upsert = True)
Finally, the data is visualized using Flask to render the html page.
from flask import Flask, render_template, redirect
@app.route("/")
def home():
data = mongo.db.collection.find_one()
return render_template("index.html", mars_data = data)
I did have trouble displaying the dataframe of Mars facts. After some tinkering, I realized a potential reason is the class attribute of the table.
<table border="1" class="dataframe table">
The easiest fix I could think of was changing the HTML script for the table directly in index.html.
<table border="1" class="table">
LinkedIn | https://www.linkedin.com/in/niko-elvambuena/
Email | niko.elvambuena95@gmail.com