Skip to content

Latest commit

 

History

History
117 lines (76 loc) · 4.02 KB

bash_preview.md

File metadata and controls

117 lines (76 loc) · 4.02 KB

Bash preview

The desire to automate repetitive tasks is a key motivator for the adoption of code in newsrooms.

Below is a short walk-through intended to demonstrate the power of automation, and often how little code is required to execute useful tasks.

As with most tasks involving code, there are many potential ways to solve a given problem. Here, we introduce the Bash shell and a few basic Unix tools by way of dipping our toes into the world of automation.

We'll skip the backstory on these tools for now, but never fear, we'll learn more about these tools early in the quarter. Just follow along for now.

Open a terminal shell (on Mac, search for Terminal).

On the shell, navigate to your Desktop.

cd ~/Desktop

Download the Stanford events home page using either curl or wget (Mac and Linux machines typically include at least one of these tools by default).

curl -o events.stanford.html https://events.stanford.edu/

# OR if curl is not available on your machine, try wget:

wget -O events.stanford.html https://events.stanford.edu/

You've just downloaded the HTML for the Stanford events home page.

You can view the contents of this file using a code editor, a web browser, or even common Unix tools such as cat or less.

Later this quarter we'll learn more about dissecting and extracting data from web pages.

Let's say we wanted to extract the titles of all recent events from this page and place them in a text file for easier review.

You can use the Unix grep utility to find lines in a file that contain certain bits of text (or even text patterns!).

In this case, let's search for all h3 tags, which appear to be used to enclose, or "mark up", event titles in the HTML.

grep h3 events.stanford.html

The above should have printed out the event titles wrapped in <h3>...</h3> HTML tags to your shell.

If that worked, great! Now let's redirect that output to a text file.

grep h3 events.stanford.html > events.stanford.txt

You can open up this file using a code editor or view its contents using cat or less on the command line.

Hit q to exit out of less

less events.stanford.txt

Finally, let's say you wanted to update your text file every day at a certain time. You could place the commands we've used so far into a shell script, and then run that script at a set time each day using a Unix utility known as cron.

Create a text file called events_updater.sh with the following:

cd ~/Desktop
touch events_updater.sh

Add the following code to the file:

cd ~/Desktop
curl -o events.stanford.html https://events.stanford.edu/
grep h3 events.stanford.html > events.stanford.txt

Next, you can use the built-in cron utility on Linux or Mac to schedule the script.

On Macs, you may need to grant cron permissions to access your file system.

Open the crontab file from the command line using crontab -e. This will open the so-called crontab file in the shell's default text editor (typically Pico or nano by default).

Copy the following into the crontab file. Then save and quit.

* * * * * /bin/sh $HOME/Desktop/events_updater.sh > /tmp/events.stanford.log 2>&1

The above schedules the script to run every minute.

After the script runs, you should see the events.stanford.html and events.stanford.txt files on your Desktop. If something went wrong, you can check /tmp/events.stanford.log for logs about potential errors (we configured our cronjob command to pipe errors and other script outputs to this temporary log file).

That's just a small taste of how we can begin automating useful but repetitive tasks. Of course, there's much room for improvement of this script, depending on your needs.

What are some improvements you could imagine making?

  • Get the dates of events
  • Generate a properly structured CSV for use in Excel
  • Scrape older events
  • Send an email alert when new events crop up