Skip to content

2b. SQL Joins

Lionel Mann edited this page Apr 12, 2018 · 1 revision

Database Normalization

When creating a database, it is important to think about how data will be stored. This is known as normalization.

There are essentially three ideas that are aimed at database normalization:

  1. Are the tables storing logical groupings of the data?
  2. Can I make changes in a single location, rather than in many tables for the same information?
  3. Can I access and manipulate data quickly and efficiently?

JOINS

JOINs are useful for allowing us to pull data from multiple tables.

Parts of a JOIN

In order to write a JOIN, we need a SELECT and FROM clause just like any other query. You will notice, we have introduced two new parts to our regular queries: JOIN and ON.

  • JOIN introduces the second table from which you would like to pull data
  • ON tells you how you would like to merge the tables in the FROM and JOIN statements together.

JOINs are like a second FROM clause. It identifies the table where the data that we want to join lives. Finally, we need to specify the relationship between the two tables. This specifies a logical statement to combine the table in from the join table. These are referred to as INNER JOINS. To learn about rows that DO NOT appear in both tables we need to use LEFT JOIN and RIGHT JOIN.

SELECT orders.*
FROM orders
JOIN accounts
ON orders.account_id = accounts.id;

Let's break this down

We are able to pull data from two tables:

  1. orders
  2. accounts

Above, we are only pulling data from the orders table since in the SELECT statement we only reference columns from the orders table.

The ON statement holds the two columns that get linked across the two tables.

If we wanted to only pull individual elements from either the orders or accounts table, we can do this by using the exact same information in the FROM and ON statements. However, in your SELECT statement, you will need to know how to specify tables and columns in the SELECT statement:

  1. The table name is always before the period, i.e tablename.
  2. The column you want from that table is always after the period, i.e tablename.column.

For example, if we want to pull only the account name and the dates, but none of the other columns, we can do this with the following query:

SELECT accounts.name, orders.occurred_at
FROM orders
JOIN accounts
ON orders.account_id = accounts.id;

This query only pulls two columns, not all the information in these two tables.

Alternatively, the below query pulls all the columns from both the accounts and orders table.

SELECT *
FROM orders
JOIN accounts
ON orders.account_id = accounts.id;

And the first query pulls all the information from only the orders table:

SELECT orders.*
FROM orders
JOIN accounts
ON orders.account_id = accounts.id;

Joining tables allows you access to each of the tables in the SELECT statement through the table name, and the columns will always follow a . after the table name.

Examples

Pull all the data from the accounts table, and all the data from the orders table.

SELECT orders.*, accounts.*
FROM accounts
JOIN orders
ON accounts.id = orders.account_id;

Pull standard_qty, gloss_qty, and poster_qty from the orders table, and the website and the primary_poc from the accounts table.

SELECT orders.standard_qty, orders.gloss_qty, orders.poster_qty, accounts.website, accounts.primary_poc
FROM accounts
JOIN orders
ON orders.account_id = accounts.id;

If we wanted to join all three of these tables, we could use the same logic. The code below pulls all of the data from all of the joined tables.

SELECT *
FROM web_events
JOIN accounts
ON web_events.account_id = accounts.id
JOIN orders
ON accounts.id = orders.account_id

JOIN holds a table, and ON is a link for our PK to equal the FK.

To pull specific columns, the SELECT statement will need to specify the table that you are wishing to pull the column from, as well as the column name. We could pull only three columns in the above by changing the select statement to the below, but maintain the rest of the JOIN information:

SELECT web_events.channel, accounts.name, orders.total

Provide a table for all web_events associated with account name of Walmart. There should be three columns. Be sure to include the primary_poc, time of the event, and the channel for each event. Additionally, you might choose to add a fourth column to assure only Walmart events were chosen.

With Alias

SELECT a.primary_poc, w.occurred_at, w.channel, a.name
FROM web_events w
JOIN accounts a
ON w.account_id = a.id
WHERE a.name = 'Walmart';

Without Alias

SELECT accounts.primary_poc, web_events.occurred_at, web_events.channel, accounts.name
FROM web_events
JOIN accounts
ON accounts.id = web_events.account_id
WHERE accounts.name = 'Walmart'

Provide a table that provides the region for each sales_rep along with their associated accounts. Your final table should include three columns: the region name, the sales rep name, and the account name. Sort the accounts alphabetically (A-Z) by account name.

NOTE: columns need unique names hence r.name region, s.name rep, a.name account

SELECT r.name region, s.name rep, a.name account
FROM sales_reps s
JOIN region r
ON s.region_id = r.id
JOIN accounts a
ON a.sales_rep_id = s.id
ORDER BY a.name;

Provide the name for each region for every order, as well as the account name and the unit price they paid (total_amt_usd/total) for the order. Your final table should have 3 columns: region name, account name, and unit price. A few accounts have 0 for total, so I divided by (total + 0.01) to assure not dividing by zero.

SELECT r.name region, a.name account, o.total_amt_usd/(o.total + 0.01) unit_price
FROM region r
JOIN sales_reps s
ON s.region_id = r.id
JOIN accounts a
ON a.sales_rep_id = s.id
JOIN orders o
ON o.account_id = a.id;

LEFT JOINS and RIGHT JOINS

If we want to include data that is not included in both tables, but only one of the tables that we are using in our JOIN statement, there are 3 types of JOINS we might use:

  1. LEFT JOIN (This is usually used most often)
  2. RIGHT JOIN
  3. FULL OUTER JOIN

When we write our JOIN statements, the table listed in the FROM clause is the left table, while the table in the JOIN in considered the right table.

SELECT
FROM left table
LEFT JOIN right table